Hamiltonian simulation in the interaction picture

ABSTRACT

In this disclosure, quantum algorithms are presented for simulating Hamiltonian time-evolution e −i(A+B)t  in the interaction picture of quantum mechanics on a quantum computer. The interaction picture is a known analytical tool for separating dynamical effects due to trivial free-evolution A from those due to interactions B. This is especially useful when the energy-scale of the trivial component is dominant, but of little interest. Whereas state-of-art simulation algorithms scale with the energy ∥A+B∥≤∥A∥+∥B∥ of the full Hamiltonian, embodiments of the disclosed approach generally scale linearly with the sum of the Hamiltonian coefficients from the low-energy component B and poly-logarithmically with those from A.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/656,517 entitled “HAMILTONIAN SIMULATION IN THE INTERACTION PICTURE” and filed on Apr. 12, 2018, which is hereby incorporated herein in its entirety.

SUMMARY

In this disclosure, quantum algorithms are presented for simulating Hamiltonian time-evolution e^(−i(A+B)t) in the interaction picture of quantum mechanics on a quantum computer. The interaction picture is a known analytical tool for separating dynamical effects due to trivial free-evolution A from those due to interactions B. This is especially useful when the energy-scale of the trivial component is dominant, but of little interest. Whereas state-of-art simulation algorithms scale with the energy ∥A∥+∥B∥ of the full Hamiltonian, embodiments of the disclosed approach generally scale with only the low-energy component ∥B∥. Applied to simulating a periodic N-site Hubbard model with arbitrary long-ranged density-density interactions, the disclosed gate complexity

(N²t) is a quadratic improvement in N over prior art for electronic structure simulation in the plane-wave basis. More generally in the abstract query model, diagonally-dominant Hamiltonians with sparsity d can be simulated with cost independent of the diagonal component. The various contributions disclosed herein include a rigorous analysis of a low-space overhead simulation algorithm based on the Dyson series for general time-dependent Hamiltonians, which also enables a quadratic improvement in sparsity for time-dependent simulation with

(td) queries.

In some embodiments of the disclosed technology, a quantum computer is configured to simulate a quantum system, and a Hamiltonian in the simulation is represented in the interaction picture. The simulation of the quantum system is then performed using the quantum computer. In certain implementations, the simulation of the quantum system is a subroutine that is repeated two or more times. In some implementations, the simulation is performed using linear combinations of unitaries. For instance, in some cases, the simulation uses linear combinations of unitaries performed on a diagonally dominant matrix (e.g., using linear combinations of unitaries performed on the diagonally dominant components of the diagonally dominant matrix). In certain implementations, the quantum system is modelled by a Hubbard model. In particular implementations, the quantum system describes a physical chemical system or molecule. In some implementations, the Hamiltonian is sparse and the simulation uses the a state of an auxiliary qubit to encode the matrix elements of the Hamiltonian instead of using graph decomposition techniques. In certain implementations, the simulation is performed by compressing ancillas for quantum simulation of a time-dependent Hamiltonian.

In further embodiments, a quantum algorithm is implemented on a quantum computer for simulating a general sparse time-dependent quantum system. In this embodiment, the quantum algorithm does not use graph decomposition techniques. In certain implementations, a quantum Hamiltonian used in the simulation is represented in the interaction picture. In some implementations, the simulation uses linear combinations of unitaries performed on a diagonally dominant matrix. In particular implementations, an interaction Hamiltonian is chosen to be a diagonal matrix. In certain implementations, the quantum system is modelled by a Hubbard model, a physical chemical system, or a molecule. In further implementations, the simulation includes compressing ancillas use to index the time for the simulation.

The foregoing and other objects, features, and advantages of the disclosed technology will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram showing a circuit that implements HAM with normalization constant α=Σ_(j=1) ^(l) α_(j) using example unitary oracles.

FIG. 2 shows a quantum circuit representations of the components implementing the truncated Dyson series algorithm TDS.

FIG. 3 is a quantum circuit representations of the gadget V for applying a sequence of probabilistic operators H_(k) . . . H₂H₁, encoded in (

0|_(α)⊗I_(s))U_(k)(|0

_(α)⊗I_(s))=H_(k), controlled on number state |k

_(b), k∈{0,1, . . . , K}.

FIG. 4 depicts the quantum circuit representation of an example implementation of HAM_(K) from Eq. (75) using K queries to controlled-HAM, a single step of the truncated Taylor series algorithm before oblivious amplitude amplification, and a single step of time-evolution by the truncated Taylor series algorithm from Eq. (78).

FIG. 5 is a schematic block diagram showing quantum circuit representations of the unitary DYS_(K) that encodes time-ordered products of Hamiltonians, using fewer ancilla than the construction in FIG. 2 by applying the compression gadget of FIG. 4 .

FIG. 6 is an example method for a time-dependent simulation algorithm as disclosed herein.

FIG. 7 is an example method for an example interaction picture simulation method as disclosed herein.

FIG. 8 is an example method simulating chemistry and Hubbard models as disclosed herein.

FIG. 9 is an example method of a compression method as disclosed herein.

FIG. 10 illustrates a generalized example of a suitable classical computing environment in which aspects of the described embodiments can be implemented.

FIG. 11 illustrates an example of a possible network topology (e.g., a client-server network) for implementing a system according to the disclosed technology.

FIG. 12 illustrates another example of a possible network topology (e.g., a distributed computing environment) for implementing a system according to the disclosed technology.

FIG. 13 illustrates an exemplary system for implementing the disclosed technology.

FIG. 14 illustrates an example method for performing a quantum simulation in accordance with an embodiment of the disclosed technology.

FIG. 15 illustrates a further example method for performing a quantum simulation in accordance with an embodiment of the disclosed technology.

DETAILED DESCRIPTION I. General Considerations

As used in this application, the singular forms “a,” “an,” and “the” include the plural forms unless the context clearly dictates otherwise. Additionally, the term “includes” means “comprises.” Further, the term “coupled” does not exclude the presence of intermediate elements between the coupled items. Further, as used herein, the term “and/or” means any one item or combination of any items in the phrase.

Although the operations of some of the disclosed methods are described in a particular, sequential order for convenient presentation, it should be understood that this manner of description encompasses rearrangement, unless a particular ordering is required by specific language set forth below. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, the attached figures may not show the various ways in which the disclosed systems, methods, and apparatus can be used in conjunction with other systems, methods, and apparatus. Additionally, the description sometimes uses terms like “produce” and “provide” to describe the disclosed methods. These terms are high-level abstractions of the actual operations that are performed. The actual operations that correspond to these terms will vary depending on the particular implementation and are readily discernible by one of ordinary skill in the art.

II. Introduction

Many-body dynamics are often described by a Hamiltonian H=A+B with two non-commuting parts: a well-understood but trivial free-particle theory A, and a more interesting component B describing interactions. One often finds a large separation of energy scales ∥A∥»∥B∥ in nature. For instance, the ultra-violet cutoff of a free scalar field is usually assumed to be significantly higher than the energy scale of its interactions. In practical applications, the energy contribution of electron-electron correlations in molecules and materials are typically larger than chemical accuracy, at 4 kJ mol⁻¹, but still much smaller than the Hartree-Fock component.

Dynamical effects arising from the interaction are often the subject of interest. Whereas the quantum state-vector |φ_(I)(t)

evolves under the full Hamiltonian by the Schrödinger equation i∂_(t)|φ(t)

=(A+B)|φ(t)

, the interaction picture moves to the rotating frame of the trivial dynamics. This frame allows one to focus on effects of the interaction, and is particularly fruitful when interactions are perturbative corrections to the free-theory. For instance, an analytic evaluation of the time-ordered propagator of the interaction picture Hamiltonian H_(I)(t)=e^(iAt)Be^(−iAt) is possible by perturbative expansions based on Green's functions and Feynman diagrams. However, such techniques fail for interactions that are non-perturbative, even if they are weak relative to the free theory. In such situations, one resorts to numerical simulations.

Numerical simulations of quantum dynamics scale exponentially with particle number, so the interaction picture provides little advantage here. One then relies on non-perturbative numerical approximations such as density-functional theory or Monte Carlo path-integral evaluation. Though highly successful in elucidating qualitative behavior, obtaining accurate quantitative predictions to high precision is difficult. For instance, achieving chemical accuracy in ab-initio quantum chemistry calculations of reaction rates often only possible by exact diagonalization of the Hamiltonian. This has an exponentially growing cost

^((n)) with the system size n. (The standard big-

notation defines f(n)=

(g(n)) for positive functions f(n),g(n)>0 as the existence of absolute constants α>0,b>0 such that for any n>α, f(n)≤bg(n). One can also use f(n)=

(g(n)) when f(n)≤bg(n) polylog(g(n)), and f(n)=Ω(g(n)) when f(n)≥bg(n), and f(n)=⊖(g(n)) when both f(n)=

(g(n)) and f(n)=Ω(g(n)) are true.)

One approach that is tractable and non-perturbative is simulating quantum dynamics on a quantum computer. Given a

(poly(n))-sized description of a time-independent Hamiltonian H, one devises a sequence of

(poly(n)) primitive quantum gates that approximates the unitary time-evolution operator e^(−iHt) with error ϵ. (Primitive gate costs are defined as the number N of arbitrary single- and two-qubit gates. In fault-tolerant architectures, one multiples by an additional overhead

(log (N/ϵ)) for approximating N such gates to total error ϵ using a universal discrete gate set, e.g. CLIFFORD+T.) This is then applied to the system state |φ(0)

encoded in

(poly(n)) logical quantum bits to obtain the time-evolved state |φ(t)

=e^(−iHt)|φ(0)

. Algorithms for Hamiltonian simulation have progressed rapidly culminating recently in schemes with optimal query complexity

(nt+log (1/ϵ)) with respect to all parameters for generic sparse Hamiltonians, and algorithms for geometrically-local exponentially-decaying interactions, with gate complexity

(nt polylog(nt/ϵ)) that is optimal up to poly-logarithmic factors.

It would be desirable if the cost of simulating time-evolution scaled with energy of the smaller interaction term, say

(t∥B∥). As the interaction picture Hamiltonian has norm ∥H_(I)(t)∥=∥B∥, this possibility is highly suggestive. Nevertheless, state-of-art performs simulation in the Schrödinger picture, where the gate cost of simulating e^(−iHt) generally scales at least like Ω(t(∥A∥+∥B∥)log(1/ϵ)), even if the trivial dynamics e^(−iAt) of term A may be evaluated in closed-form. Even with specialized techniques that scale with a geometric mean

(

^((k))t∥A∥(t∥B∥/ϵ)^(1/k)), ∀k∈2

⁺. the uninteresting component

(t∥A∥) is the dominant pre-factor.

Unfortunately, simulation in the interaction picture is difficult. There, evolution by a time-independent Hamiltonian H is transformed in the rotating frame |φ_(I)(t)

=e^(iAt)|φ(t)

to evolution by a time-dependent Hamiltonian H_(I)(t)=e^(iAt)Be^(−iAt). This follows from an elementary manipulation of the Schrödinger equation: i∂ _(t)|φ(t)

=(A+B)|φ(t)

→i∂ _(t)|φ_(I)(t)

=e ^(iAt) Be ^(−iAt)|φ_(I)(t)

.  (1) Implementing the time-ordered propagator

[exp(−i∫₀ ^(t)H(s)ds)] that solves Eq. (1) on a quantum computer requires time-dependent simulation algorithms. These are generally more complicated than time-independent algorithms, and exhibit different cost trade-offs that do not appear favorable. For instance, an order-k time-dependent Trotter-Suzuki product formula has a cost that scales with the rate of change of H(t) like

(

^((k))(t∧)^(1+1/(2k))), where ∧=max_(s)∥{dot over (H)}(s)∥^(1/2)=

(∥[A,B]∥^(1/2)). More advanced techniques based on compressed fractional queries appear to scale better like ˜

$t{B}\frac{\log\left( {\Lambda\;{t/\epsilon}} \right)}{\log\;{\log\left( {\Lambda\;{t/\epsilon}} \right)}}$ but in terms of queries to a unitary oracle that obscures the gate complexity as it expresses Hamiltonian matrix elements at different times in binary, and may be difficult to implement in practice. One proposed technique directly implements a truncated Dyson series of a the time-ordered propagator and argues, though without proof, a similar scaling in terms of queries to a different type of oracle.

Here, it is shown that simulation in the interaction picture can substantially improve the efficiency of simulation. In Section III, the general time-dependent simulation algorithm is completed by a truncated Dyson series by providing a rigorous analysis of the approximation and explicit circuit constructions, with improvements in gate and space complexity over previously expected costs. In Section IV, situations are identified where the gate complexity of implementing these queries scale with the interaction strength

(∥B∥), and not the larger uninteresting component

(∥A∥). Such are the cases where simulation in the interaction picture is advantageous. In Section V, the potential of interaction-picture simulation by an electronic structure application in the plane-wave basis is discussed. Further, the cost of simulating the time-evolution of N spin-orbitals subject to long-range electron-electron interactions to

(N²t) gates is discussed, which is close to a quadratic improvement over prior art of

(N^(11/3)t). In Section VI, a complexity theoretic perspective of the disclosure is provided by considering the abstract problem of simulating time-dependent sparse Hamiltonians in the standard query model. A quadratic improvement in sparsity scaling is shown, and optimized algorithms for simulating diagonally dominant Hamiltonians are provided. A more detailed summary of main results in each section is as follows.

In Section III (“Time-dependent Hamiltonian simulation by a truncated Dyson series”), an example algorithm for a general time-dependent simulation algorithm Theorem 1 is provided. The example algorithm is based on approximating a truncation and discretization of the Dyson series for general time-dependent Hamiltonians H(t) characterized by spectral-norm α≥max_(t)∥H(t)∥ and average rate-of-change

$\left\langle {\overset{.}{H}} \right\rangle = {\frac{1}{t}{\int_{0}^{t}{{{\overset{.}{H}(s)}}{{ds}.}}}}$ A rigorous analysis of its performance is provided in together with explicit circuit constructions. Bounds on the approximation error is evaluated in Section III A, which is used to obtain the cost of simulating the time-ordered evolution operator. This cost is found to be

$\mathcal{O}\left( {\alpha\; t\;\frac{\log\;\left( {\alpha\;{t/\epsilon}} \right)}{\log\;{\log\left( {\alpha\;{t/ɛ}} \right)}}} \right)$ queries to a certain unitary oracle HAM-T that encodes H(t) evaluated

$M = {\mathcal{O}\left( {\frac{1}{\epsilon}\frac{\left\langle {\overset{.}{H}} \right\rangle}{\alpha^{3}}} \right)}$ points. Example implementations as disclosed in Section III B, characterized by Theorem 2, completes the scheme proposed by D. W. Berry, A. M, Childs, and R. Kothari, “Hamiltonian simulation with nearly optimal dependence on all parameters,” in Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 792-809 (October 2015), with some improvements in gate complexity. One obtains scaling with the average rate-of-change

(log(

∥H∥

)), instead of the worst case

(log(max_(t)∥H(t)∥)). This allows one to simulate Hamiltonians with arbitrary elementwise slew-rates, so long as ∥H(t)∥ remains bounded. One can then devise a compression gadget in Section III C that reduces the qubit overhead complexity of this implementation to

(log(M)), compared to

$\mathcal{O}\left( {{\log(M)}\frac{\log\left( {\alpha\;{t/\epsilon}} \right)}{\log\;\log\;\left( {\alpha\;{t/\epsilon}} \right)}} \right)$ of the original. This gadget is of independent interest and also applicable to the time-independent truncated Taylor series algorithm, discussed in Section XA for completeness.

In Section IV (“Accelerated interaction picture simulation”), one can apply this truncated Dyson series algorithm to simulate time-evolution by e^(−i(A+B)t) in the interaction picture. One can evaluate the gate complexity in Section IV A for one possible construction of these queries HAM-T for the interaction picture Hamiltonian H_(I)(t)=e^(iAt)Be^(−iAt). This leads to the simulation algorithm Theorem 3 for time-evolution by e^(−i(A+B)t) with gate complexity

((C _(B) +C _(e) _(iA/α) _(B))α_(B)t polylog(∥A∥,α _(B) ,t,ϵ)),  (2) where C_(B) is the gate complexity a unitary oracle O_(B) such that

$\left( {{{\left\langle 0 \middle| {}_{a}{\otimes I_{s}} \right){O_{B}\left( {\left. 0 \right\rangle_{a} \otimes I_{s}} \right)}} = \frac{B}{\alpha_{B}}},} \right.$ and C_(e) _(iA/α) _(B) is the cost of simulating time-evolution by A for time

(α_(B) ⁻¹). This may be compared to state-of-art Schrödinger picture simulation algorithms for time-independent Hamiltonians, which have gate complexity

((C _(B) +C _(A))(α_(B)+α_(A))tpolylog(∥A∥,∥B∥,t,ϵ)),  (3) where C_(A) is analogous to C_(B) but for the Hamiltonian term A. The disclosed result Eq. (2) is then advantageous in cases where ∥A∥»∥B∥ and C_(e) _(iA/α) =

(C_(B)). There, the gate complexity scales with the interaction strength ∥B∥, and not the larger uninteresting component ∥A∥, up to polylogarithmic factors.

In Section V (“Application to the Hubbard model with long-ranged interactions”),

One can demonstrate a concrete advantage for Hamiltonian simulation in the interaction picture over time-independent simulation algorithms. One can consider a general Hubbard periodic model on N lattice sites in an arbitrary number of dimensions, which has Hamiltonian

$\begin{matrix} {{H = {{\sum\limits_{\overset{\rightarrow}{x},\overset{\rightarrow}{y},\sigma}{{T\left( {\overset{\rightarrow}{x} - \overset{\rightarrow}{y}} \right)}a_{\overset{\rightarrow}{x}\sigma}^{\dagger}a_{\overset{\rightarrow}{y}\sigma}}} + {\sum\limits_{\overset{\rightarrow}{x},\sigma}{{U\left( {\overset{\rightarrow}{x},\sigma} \right)}n_{\overset{\rightarrow}{x}\sigma}}} + {\sum\limits_{{({\overset{\rightarrow}{x},\sigma})} \neq {({\overset{\rightarrow}{y},\sigma^{\prime}})}}{{V\left( {\overset{\rightarrow}{x} - \overset{\rightarrow}{y}} \right)}n_{\overset{\rightarrow}{x}\sigma}n_{\overset{\rightarrow}{y}\sigma^{\prime}}}}}},} & (4) \end{matrix}$ where α_({right arrow over (x)}σ) ^(†), α_({right arrow over (x)}σ) are Fermionic creation and annihilation operators indexed by lattice position {right arrow over (x)} and spin σ, and n_({right arrow over (x)}σ) is a number operator. Note that one can allow for arbitrary translationally invariant kinetic T(⋅) hopping terms and long-range density-density V(⋅) interactions, subject to arbitrary local disorder U(⋅). Provided that the kinetic term is extensive, embodiments of the disclosed interaction-picture algorithm have gate complexity

(N²t), independent of U and V up to poly-logarithmic factors. This model generalizes electronic structure simulations in the plane-wave basis, considered in Section V A, where one obtains almost a quadratic improvement over the prior art of

(N^(11/3)t) gates for this special case. This is complementary to recent work by Jeongwan Haah, Matthew B Hastings, Robin Kothari, and Guang Hao Low, Quan- tum algorithm for simulating real time evolution of lattice Hamiltonians. arXiv preprint arXiv:1801.03922, 2018, which achieves

(Nt) scaling, but under the assumption of short-range exponentially decaying coefficients.

In Section VI (“Application to sparse Hamiltonian simulation”), the application to the Hubbard model with long-ranged interactions is explained.

Consider next a complexity-theoretic generalization of the technology that has been developed for time-dependent simulation and simulation in the interaction picture. The standard query model for black-box d-sparse Hamiltonian simulation assumes access to a unitary oracle that provides the positions and values of non-zero entries of the Hamiltonian. Each row has at most d non-zero entries, and the maximum absolute value of any entry is ∥H∥_(max). This infortmation is also provided as a function of a time index for time-dependent Hamiltonians, In Section VI A, one considers this time-dependent case and show with Theorem 4 that the time-ordered evolution operator may be simulated using

$\mathcal{O}\left( {{td}{H}_{\max}\frac{\log\left( {{td}{{H}_{\max}/\epsilon}} \right)}{\log{\log\left( {{td}{{H}_{\max}/\epsilon}} \right)}}} \right)$ queries. Though linear scaling with respect to d is well-known in the time-independent case, this is a quadratic improvement in sparsity scaling over prior art for the time-dependent case. An analogous treatment for simulating sparse time-independent Hamiltonian in the interaction picture in Section VI B, Theorem 5 has query complexity that scales only with ∥H_(od)∥_(max), which is the maximum absolute value of any off-diagonal entry. This improved cost

${\mathcal{O}\left( {{td}{H_{od}}_{\max}\frac{\log\left( {{td}{{H_{od}}_{\max}/\epsilon}} \right)}{\log{\log\left( {{td}{{H_{od}}_{\max}/\epsilon}} \right)}}} \right)},$ is particularly advantageous for the common situation of diagonally dominant Hamiltonians.

III Time-Dependent Hamiltonian Simulation by A Truncated Dyson Series

The time-dependent simulation technique applied throughout this work is motivated by the Dyson series which solves the time-dependent Schrödinger equation. The Dyson series is a power-series expansion of the unitary propagator that involves integrals of products of H(t) at different times. The technique implements a truncation of this power-series, where integrals in each term is discretized in time.

This technology is presented with a rigorous analysis of the truncation and discretization error in Section III A, followed by an explicit circuit construction that is in the spirit of earlier work in Section III B. Two improvements are of note. First, the gate complexity of the algorithm scales with the average rate-of-change

(log(

∥H∥

)), instead of the worst case

(log (max_(t)∥H(t)∥). This allows one to simulate Hamiltonians with arbitrary element-wise slew-rates, so long as ∥H(t)∥ remains bounded. Second, in Section III C, an alternate implementation of the technique is introduced that reduces the space overhead of the original proposal substantially, and has the same query and gate complexity.

The complexity of any simulation algorithm depends on assumptions made about how information on the time-dependent Hamiltonian H(t) is accessed by the quantum computer. This section begins by defining an input model for the unitary quantum oracle HAM-T that encodes H(t). The complexity of time-dependent simulation is then expressed in terms of queries to HAM-T (It is assumed that the query complexity to a controlled-unitary black-box is the same as that to the original black-box. In general, this will not affect the gate complexity. Though there are often cleverer ways to implement an arbitrary controlled-unitary, in the worst-case, all quantum gates may be replaced by their controlled versions with constant overhead.), any additional primitive gates required beyond HAM-T, and the total number of qubits required including that for HAM-T. In later sections where the time-dependent simulation algorithm is applied to the interaction picture, this oracle is “opened up” and and the gate complexity of its implementation is discussed.

A common assumption in the time-independent situation is that H=Σ_(j=1) ^(l)α_(j)H_(j) is a sum of l local Hermitian terms with positive coefficients α_(j), and that each term e^(−iH) ^(j) ^(t) may be implemented with

(l) quantum gates for any t. In abstract models for sparse Hamiltonians, a binary representation of matrix entries H_(jk) and positions are returned in superposition by querying a certain unitary quantum oracle that is a natural generalization of classical Boolean circuits, Here, Hamiltonians H is considered encoded in a so-called standard-form. For any time-independent Hamiltonian H, it is assumed that there exists a unitary oracle HAM such that (

0|_(a) ⊗I _(s))HAM(|0

_(a) ⊗I _(s))=H/α,  (5) where HAM acts jointly on registers α, s consisting of n_(α), n_(s) qubits respectively, and α≥∥H∥ is a normalization constant that should be made small for best performance. In other words, post-selected on the n_(α)-qubit measurement outcome |0

_(α) the Hamiltonian H is applied on any arbitrary state in the system register s. Use of this is justified as it generalizes a variety of different input models. Thus any simulation algorithm that queries Eq. (5) for this general case readily specializes to more structured inputs. As an example, H=Σ_(j=1) ² ^(n) ^(α)α_(j)U_(j) could be a linear combination of l=2^(n) ^(α) unitaries. Then the circuit depicted in FIG. 1 implements HAM with normalization constant α=Σ_(j=1) ^(l) α_(j) using the unitary oracles

$\begin{matrix} {{{HAM} = {\left( {{PREP}^{\dagger} \otimes I_{s}} \right) \cdot {SEL} \cdot \left( {{PREP} \otimes I_{s}} \right)}},{{{PREP}\left. ❘0 \right\rangle_{a}} = {\sum\limits_{j = 1}^{i}{\sqrt{\frac{a_{j}}{\alpha}}\left. ❘j \right\rangle_{a}}}},{{SEL} = {\sum\limits_{j = 1}^{i}{\left. ❘j \right\rangle{\left\langle j❘ \right._{a} \otimes {U_{j}.}}}}}} & (6) \end{matrix}$ These unitaries each cost

(l) gates—PREP is implemented by arbitrary quantum state preparation on l dimensions, and SEL is implemented by binary-tree control logic for l different inputs. Typically, the number of terms in the Hamiltonian scales like l=

(poly(n_(s))), so n_(α)=

(log(l)). However, the number of ancilla qubits can also be polynomial in n_(s), such as in the simulation of sparse Hamiltonians where n_(α)≥n_(s)+2, or where a space-time complexity trade-off is possible.

FIG. 1 shows a quantum circuit representation of (left) an oracle HAM from Eq. (5) encoding a time-independent Hamiltonian, (center) an oracle HAM-T from Eq. (7) encoding a time-dependent Hamiltonian, and (right) an example implementation of HAM from with a linear-combination of unitaries from Eq. (6). Bold horizontal lines depict registers that in general comprise of multiple qubits. Vertical lines connecting boxes depict unitaries that act jointly on all registers covered by the boxes. A small square box marked by T indicates control by a time index.

A number of techniques may be applied to approximate the unitary time-evolution operator e^(−iHt) to error ϵ by querying HAM. The combined Qubitization and quantum signal processing approach is one such algorithm with query complexity

(αt+log(1/ϵ)) that is optimal for sparse Hamiltonians. Alternatively, one could apply a truncated Taylor series of e^(−iHt), which has a slightly worse query and space complexity, but benefits from being conceptually simpler. As the Dyson series algorithm is time-dependent generalization of the truncated Taylor series algorithm, a review and some useful background is provided before in Section X A for completeness.

A natural time-dependent generalization of Eq. (6) is the unitary oracle HAM-T encoding the Hamiltonian H(s) defined over time s ∈[0,t], with t>0. The continuous parameter s is discretized into an integer number of M>0 time bins which are indexed by m ∈ {0, 1, . . . , M−1}. Here, it is assumed that H(s) is encoded in an analogous format

$\begin{matrix} {{\left( {\left\langle 0❘ \right._{a} \otimes I_{s}} \right){HAM} - {T\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = {\sum\limits_{m = 0}^{M - 1}{\left. ❘m \right\rangle{\left\langle m❘ \right._{d} \otimes {\frac{H\left( {m\Delta} \right)}{\alpha}.}}}}} & (7) \end{matrix}$ Given black-box access to HAM-T, details of a quantum algorithm that approximates the Dyson series Eq. (8) are provided. The overall algorithm is efficient so long as HAM-T has an efficient implementation in terms of primitive quantum gates. More formally, the performance of the algorithm is captured by the following theorem, proven in the remainder of this section.

Theorem 1 (Hamiltonian simulation by a truncated Dyson series). Let a time-dependent Hamiltonian H(s) be characterized by spectral norm α≥max_(s) ∥H(s)∥, and average rate-of-change

$\left\langle {\overset{.}{H}} \right\rangle = {\frac{1}{t}{\int_{o}^{t}{\frac{{dH}(s)}{ds}\, }}}$ ds. This Hamiltonian is defined for s∈[0, t], where |αt|≤½, and is encoded in an oracle HAM-T acting on n_(s)+n_(α)+n_(d) qubits, as per Eq. (7), evaluated at

$M = {\mathcal{O}\left( {\frac{t^{2}}{\epsilon}\left( {\frac{\left\langle {\overset{.}{H}} \right\rangle}{\alpha} + \frac{\max_{s}{{H(s)}}^{2}}{\alpha^{2}}} \right)} \right)}$ points, where n_(d)=

(log (M)). Then the time-ordered evolution operator

[exp(−i∫₀ ^(t)H(s)ds)] may be approximated to error ϵ with the following cost.

${1.\mspace{14mu}{Queries}\mspace{14mu}{to}\mspace{14mu}{HAM}\text{-}T\text{:}\mspace{14mu}{{\mathcal{O}\left( \frac{\log\left( {t/\epsilon} \right)}{\log\;{\log\left( {t/\epsilon} \right)}} \right)}.2.}\mspace{14mu}{Qubits}\text{:}\mspace{14mu} n_{s}} + {{{\mathcal{O}\left( {n_{a} + {\log(M)}} \right)}.3.}\mspace{14mu}{Primitive}\mspace{14mu}{gates}\text{:}\mspace{14mu}{{\mathcal{O}\left( {\left( {n_{a} + {\log(M)}} \right)\frac{\log\left( {1/\epsilon} \right)}{\log\;{\log\left( {1/\epsilon} \right)}}} \right)}.}}$

Note that, this algorithm performs time-evolution for a relatively short duration of t≤α⁻¹. Time-evolution for longer durations is simulated by breaking the time-ordered evolution operator

[exp(−i∫₀ ^(t)H(s)ds)] into L=

(αt) time-steps 0=t₀<t₁ . . . <t_(L)=t of size

(α⁻¹), and each step is implemented by the quantum circuit underlying Theorem 1 that queries a different HAM-T now defined for a Hamiltonian H(s) for s∈[t_(j),t_(j−1)]. As the final error is Lϵ by a simple triangle inequality, one can rescale ϵ→ϵ/L. Thus simulation for the full duration to error ϵ has query complexity

${\mathcal{O}\left( {\alpha\; t\frac{\log\left( {\alpha\;{t/\epsilon}} \right)}{\log\;{\log\left( {\alpha\;{t/\epsilon}} \right)}}} \right)}.$

A. Dyson Series Truncation Error

The choice of parameters in Theorem 1 depends on the scaling of error from a finite and discrete approximation of the Dyson series. Consider a unitary propagator U(t) defined to solve the time-dependent Schrödinger equation i∂_(t)|φ(t)

=H(t)|φ(t)

by providing the state |φ(t)

=U(t)|φ(0)

given the initial state |φ(0)

at time t=0. For any t>0 and bounded ∥H(t)∥, U(t) has an absolutely convergent infinite expansion known as the Dyson series U(t)=I−i∫ ₀ ^(t) H(t ₁)dt ₁−∫_(t) ₂ ^(t)∫₀ ^(t) ² H(t ₂)H(t ₁)dt ₁ dt ₂ +i∫ _(t) ₃ ^(t)∫_(t) ₂ ^(t) ³ ∫₀ ^(t) ² H(t ₃)H(t ₂)H(t ₁)dt ₁ dt ₂ dt ₃+ . . . .  (8) This may be compactly represented using the time-ordering operator

which sorts any sequence of k operators according to the times t_(j) of their evaluation, that is,

[H(t_(k)) . . . H(t₂)H(t₁)]=H(t_(σ(k))) . . . H(t_(σ(2)))H(t_(σ(1))), where σ is a permutation such that t_(σ(1))≤t_(σ(2))≤ . . . ≤t_(σ(k)). For instance,

[H(t₂)H(t₁)]=θ(t₂−t₁)H(t₂)H(t₁)+θ(t₁−t₂)H(t₁)H(t₂) using the Heaviside step function θ. With this notation, the propagator is commonly expressed as a time-ordered evolution operator U(t)=

[e^(−i∫) ⁰ ^(t) ^(H(s)ds)], and Eq. (8) is written as

$\begin{matrix} {{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} = {\sum\limits_{k = 0}^{\infty}{\left( {- i} \right)^{k}D_{k}}}},{D_{k} = {\frac{1}{k!}{\int_{0}^{t}{\ldots{\int_{0}^{t}{{\mathcal{T}\left\lbrack {{H\left( t_{k} \right)}\mspace{14mu}\ldots\mspace{14mu}{H\left( t_{1} \right)}} \right\rbrack}d^{k}{t.}}}}}}}} & (9) \end{matrix}$

The Dyson series of Eq. (9) may be approximated on a computer by truncating the infinite expansion at some finite order k=K≥0, and evaluating H(t_(j)) at some finite number of M time-steps of size Δ=t/M. Thus, one has the approximation

$\begin{matrix} {{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} \approx {\sum\limits_{k = 0}^{K}{\left( {- i} \right)^{k}D_{k}}} \approx {\sum\limits_{k = 0}^{\infty}{\frac{\left( {- {it}} \right)^{k}}{{k!}M^{k}}{\overset{\sim}{B}}_{k}}}},{{\overset{\sim}{B}}_{k} = {\sum\limits_{m_{1},\ldots,{m_{k} = 0}}^{M - 1}{\mathcal{T}\left\lbrack {{H\left( {m_{k}\Delta} \right)}\mspace{14mu}\ldots\mspace{14mu}{H\left( {m_{1}\Delta} \right)}} \right\rbrack}}},} & (10) \end{matrix}$ which converges to U(t) in the limit K, M→∞ if H(t) is Riemann integrable. One can obtain some intuition on how the error of this approximation depends on choices of K, M, H(t). As ∥H(t)∥ is bounded,

${D_{k}} \leq {\frac{t^{k}}{k!}{\max_{s}{{{H(s)}}^{k}.}}}$ Using Taylor's remainder theorem for the exponential function, the error of truncation for |t| max_(s) ∥H(s)∥=

(1) is at most ϵ=

(1/K!). Thus the order of truncation is at most

$K = {{\mathcal{O}\left( \frac{\log\left( {1/\epsilon} \right)}{\log\;{\log\left( {1/\epsilon} \right)}} \right)}.}$ As {tilde over (B)}_(k) is a left Riemann sum of D_(k), one expects that the error of discretization should scale like

$\epsilon = {{\mathcal{O}\left( \frac{1}{M} \right)}.}$ These two errors may then be added by a triangle inequality to obtain the overall error from approximating

[e^(−i∫) ⁰ ^(t) ^(H(s)ds)] with

$\sum\limits_{k = 0}^{K}{\frac{\left( {- {it}} \right)^{k}}{M^{k}}{B_{k}.}}$ The explicit time-ordering in {tilde over (B)}_(k) may be removed with a slightly different approximation

$\begin{matrix} {{{\overset{\sim}{B}}_{k} = {{{k!}B_{k}} + C_{k}}},{B_{k} = {\sum\limits_{0 \leq m_{1} < \ldots < m_{k} < M}{{H\left( {m_{k}\Delta} \right)}\mspace{14mu}\ldots\mspace{14mu}{H\left( {m_{1}\Delta} \right)}}}},} & (11) \end{matrix}$ where C_(k) captures terms where at least one pair of indices m_(j)=m_(k) collide for j≠k. A rigorous statement of this intuition is given by the following Lemma 1, with a detailed proof in Section XI.

Lemma 1 (Error from truncating and discretizing the Dyson series). Let H(s):

→

^(N×N) be differentiable on the domain [0, t], and let

$\left\langle {\overset{.}{H}} \right\rangle\text{:} = \frac{1}{t}{\int_{0}^{t}{{\frac{{dH}(s)}{ds}}{{ds}.}}}$ For any ϵ∈[0, 2^(1-e)], an approximation to the time ordered operator exponential of −iH(s) can be constructed such that

${{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} - {\sum\limits_{k = 0}^{K}{\left( {{- {it}}/M} \right)^{k}B_{k}}}}} \leq \epsilon$ if one takes all the following are true.

1.  max_(s)H(s)t ≤ ln  2. ${2.\mspace{14mu} K} = {{{\left\lceil {{- 1} + \frac{2\;{\ln\left( {2/\epsilon} \right)}}{{\ln\;{\ln\left( {2/\epsilon} \right)}} + 1}} \right\rceil.3.}\mspace{14mu} M} \geq {\max{\left\{ {{\frac{16\; t^{2}}{\epsilon}\left( {\left\langle {\overset{.}{H}} \right\rangle + {\max_{s}{{H(s)}}^{2}}} \right)},K^{2}} \right\}.}}}$

On a quantum computer, it becomes possible to compute the B_(k) exactly and efficiently even it they sum over exponentially many points M. In contrast, computing these Riemann sums on classical computer would be prohibitive, even by approximate Monte-Carlo sampling, which is exacerbated by the sign problem. However, this efficient quantum computation crucially assumes that information on terms of the Hamiltonian are made accessible in a certain coherent manner. In the case discussed here, this is information is accessed by querying the black-box unitary oracle HAM-T in Eq. (7).

B. Algorithm by Duplicating Control Registers

A quantum algorithm is presented that applies the ϵ-approximation

$\overset{\sim}{U} = {\sum\limits_{k = 0}^{\infty}{\frac{\left( {- {it}} \right)^{k}}{M^{k}}B_{k}}}$ to the time-ordered evolution operator

[e^(−i∫) ⁰ ^(t) ^(H(s)ds)], where the truncation order K and the number of discretization points M are given by Lemma 1. This version is the prelude to Theorem 1 and has worse space complexity. The algorithm proceeds in two steps: First, one synthesizes a unitary quantum circuit DYS_(K) that applies Ũ with success probability ½ on any input state to the s register. As B_(K) contains K a product of K Hamiltonian, this step requires

(K) queries to HAM-T. Second, since Ũ is ϵ-close to unitary, one applies one round of oblivious amplitude amplification to boost this probability to 1−

(ϵ). Among the contributions here are rigorous bounds on K and M, and the implementation of a step not discussed previously—the efficient preparation of a particular quantuni state that correctly selects a desired linear combination of time-ordered products of Hamiltonians. This step is non-obvious as the state has

(M!) different amplitudes, and in the worst-case would take

(M!) gates to create by arbitrary state preparation techniques. The cost of this implementation is captured by the following theorem.

Theorem 2 (Hamiltonian simulation by a truncated Dyson series with duplicated registers). Let a time-dependent Hamiltonian H(s) be characterized by spectral norm α≥max_(s) ∥H(s)∥, and average rate-of-change

$\left\langle {\overset{.}{H}} \right\rangle = {\frac{1}{t}{\int_{0}^{t}{{\frac{{dH}(s)}{ds}}{{ds}.}}}}$ This Hamiltonian is defined for s ∈ [0,t], where |αt|≤½, and is encoded in an oracle HAM-T acting on n_(s)+n_(α)+n_(d) qubits, as per Eq. (7), evaluated at

$M = {\mathcal{O}\left( {\frac{t^{2}}{\epsilon}\left( {\frac{\left\langle {\overset{.}{H}} \right\rangle}{\alpha} + \frac{\max_{s}{{H(s)}}^{2}}{\alpha^{2}}} \right)} \right)}$ points, where n_(d)=

(log (M)). Then the time-ordered evolution operator

[exp (−i∫₀ ^(t)H(s)ds)] may be approximated to error ϵ with the following cost.

${1.\mspace{14mu}{Queries}\mspace{14mu}{to}\mspace{14mu}{HAM}\text{-}T\text{:}\mspace{14mu}{{\mathcal{O}\left( \frac{\log\left( {t/\epsilon} \right)}{\log\;{\log\left( {t/\epsilon} \right)}} \right)}.2.}\mspace{14mu}{Qubits}\text{:}\mspace{14mu} n_{s}} + {{{\mathcal{O}\left( {n_{a} + {{\log(M)}\frac{\log\left( {t/\epsilon} \right)}{\log\;{\log\left( {t/\epsilon} \right)}}}} \right)}.3.}\mspace{14mu}{Primitive}\mspace{14mu}{gates}\text{:}\mspace{14mu}{{\mathcal{O}\left( {\left( {n_{a} + {\log(M)}} \right)\frac{\log\left( {1/\epsilon} \right)}{\log\;{\log\left( {1/\epsilon} \right)}}} \right)}.}}$ Proof Let HAM-T_(K) be a unitary that acts jointly on registers s, {right arrow over (α)}, {right arrow over (b)}, c, {right arrow over (d)}. This unitary is depicted in FIG. 2 and is defined to apply products of Hamiltonians

$\begin{matrix} {{\left( {\left\langle 0 \middle| {}_{\overset{->}{a}e}{\otimes I_{s}} \right){HAM}\text{-}{{T_{K}\left( \left| 0 \right. \right\rangle}_{\overset{->}{a}e} \otimes I_{s}}} \right)\text{:}} = {\quad{{\left( {{\sum\limits_{k = 0}^{K}{\left. k \right\rangle\mspace{14mu}{\left\langle k \right._{\overset{->}{b}} \otimes \left( {\sum\limits_{\overset{->}{m} \in {\lbrack M\rbrack}^{k}}{\left. \overset{->}{m} \right\rangle\mspace{14mu}{\left\langle \overset{->}{m} \right._{d_{1}\ldots\; d_{k}} \otimes I_{d_{k + 1}\ldots\; d_{K}} \otimes \left( {\prod\limits_{j = 1}^{k}{H\left( {m_{j}\Delta} \right)}} \right)}}} \right)}}} + \ldots}\mspace{14mu} \right) \otimes {SWAP}_{c}},}}} & (12) \end{matrix}$ were SWAP_(c) swaps the two qubits of register c. Note that the action of HAM-T_(K) is only defined for input states to register {right arrow over (b)} that are spanned by basis states of the unary encoding |k

_({right arrow over (b)})=|0

^(⊗k)|1

^(⊗K-k), which determines the number of terms in the product. As seen in the figure, HAM-T_(K) makes K queries to HAM-T and copies the α, b, and d registers K times.

FIG. 2 shows a quantum circuit representations of the components implementing the truncated Dyson series algorithm TDS.

Now, consider a state |s_(k)

_({right arrow over (d)}), which is a uniform superposition of time-index states under the dimension-k unit simplex

$\begin{matrix} {\left. \left. {{\left. \left| s_{k} \right. \right\rangle_{\overset{\rightarrow}{d}}\text{:}} = {\sqrt{\frac{{k!}{\left( {M - k} \right)!}}{M!}}\left( {\sum\limits_{0 \leq m_{1} < m_{2} < \ldots < m_{k} < M}\left| \overset{\rightarrow}{m} \right.} \right\rangle_{d_{1}\ldots\; d_{k}}}} \right) \middle| 0 \right\rangle_{d_{k + 1}\ldots\; d_{K}}.} & (13) \end{matrix}$ This state is easy to prepare when k=1—there, it is simply a uniform superposition over M number states, and costs

(log M) gates. Otherwise, naive methods based on rejection sampling have some success probability |γk|² that decreases exponentially with large k. Let PREP_(K) be one such unitary that prepares |s_(k)

_({right arrow over (d)}) on measurement outcome |00

_(c). PREP_(K) |k

_({right arrow over (b)})|0

_(c{right arrow over (d)}) :=|k

_({right arrow over (b)})(γk|00

_(c) |s _(k)

_({right arrow over (d)})+√{square root over (1−|γk| ²)}|01

_(c) . . . ).  (14) For each order k, the Riemann sum B_(k) may be implemented by DYS_(K):=(PREP_(K) ^(†)⊗I_(αs)). HAM-T_(K)·(PREP_(K)⊗I_(αs)), as depicted in FIG. 2 . The unitary DYS_(K) encodes precisely terms B_(k) of the Dyson series as follows

$\begin{matrix} {\left( {\left\langle 0 \middle| {}_{\overset{\rightarrow}{a}c\;\overset{\rightarrow}{d}}{\otimes I_{\overset{\rightarrow}{b}s}} \right){{{DYS}_{K}\left( \left| 0 \right. \right\rangle}_{\overset{\rightarrow}{a}c\;\overset{\rightarrow}{d}} \otimes I_{\overset{\rightarrow}{b}s}}} \right) = {\sum\limits_{k = 0}^{K}{\left. k \right\rangle\mspace{14mu}{\left\langle k \right._{\overset{\rightarrow}{b}} \otimes \frac{{\gamma_{k}}^{2}{k!}{\left( {M - k} \right)!}}{M!}}{B_{k}.}}}} & (15) \end{matrix}$

Now, a linear combination of Dyson series terms is implemented by preparing a state with the appropriate amplitudes in the basis |k

_({right arrow over (b)}). The required state preparation operators are

$\begin{matrix} {\left. {{\left. {\left. {{\left. \mspace{79mu}\left. {COEF}_{K} \middle| 0 \right. \right\rangle_{\overset{\rightarrow}{b}}\text{:}} = \left. {\frac{1}{\sqrt{\beta}}{\sum\limits_{k = 0}^{K}\sqrt{\frac{{M!}\left( {- {it}} \right)^{k}}{M^{k}{\gamma_{k}}^{2}{k!}{\left( {M - k} \right)!}}}}} \middle| k \right.} \right\rangle_{\overset{->}{b}},\mspace{20mu}{\beta = {\sum\limits_{k = 0}^{K}\frac{{M!}t^{k}}{M^{k}{\gamma_{k}}^{2}{k!}{\left( {M - k} \right)!}}}},\left. {COEF}_{K}^{\prime} \middle| 0 \right.} \right\rangle_{\overset{->}{b}}\text{:}} = \left. {\frac{1}{\sqrt{\beta}}{\sum\limits_{k = 0}^{K}\sqrt{\frac{{M!}t^{k}}{M^{k}{\gamma_{k}}^{2}{k!}{\left( {M - k} \right)!}}}}} \middle| k \right.} \right\rangle_{\overset{->}{b}},} & (16) \end{matrix}$ and may be implemented using

(K) primitive gates. Up to a proportionality factor β, one can obtain the desired linear combination for simulating time-evohnion.

$\begin{matrix} {\left. {\left. {{{TDS}_{\beta}:={\left( {{COEF}_{K}^{\prime\dagger} \otimes I_{\overset{\rightarrow}{a}c\overset{\rightarrow}{d}s}} \right) \cdot {DYS}_{K} \cdot \left( {{COEF}_{K} \otimes I_{\overset{\rightarrow}{a}c\overset{\rightarrow}{d}s}} \right)}}{\left( {\left\langle 0❘ \right._{\overset{\rightarrow}{a}\overset{\rightarrow}{b}c\overset{\rightarrow}{d}} \otimes I_{s}} \right){{TDS}_{\beta}\left( ❘ \right.}0}} \right\rangle_{\overset{\rightarrow}{a}\overset{\rightarrow}{b}c\overset{\rightarrow}{d}} \otimes I_{s}} \right) = {\frac{\sum\limits_{k = 0}^{K}{\left( {- {it}} \right)^{k}B_{k}}}{M^{k}\beta} \approx {\frac{\mathcal{T}e^{{- i}{\int_{0}^{t}{{H(s)}\,{ds}}}}}{\beta}.}}} & (17) \end{matrix}$ If one chooses t to be sufficiently small such that β=2, one can then obtain a single time-step of the truncated Dyson series algorithm TDS in FIG. 2 . All that remains is to find an implementation of PREP_(K) that prepares |s_(k)

_({right arrow over (d)}) with an amplitude that |γk| that is sufficiently large so that t=⊖(1).

The state |s_(k)

_({right arrow over (b)}c{right arrow over (d)}) can be prepared in a number of ways. The most straightforward approach creates a uniform superposition of states over the dimension-k hypercube using n_(d)×k Hadamard gates HAD, then uses k reversible adders to flag states |{right arrow over (m)}

_(d) ₁ _(. . . d) _(k) with the correct ordering. This circuit

produces |s_(k)

_({right arrow over (d)}) with amplitude

$\gamma_{k} = {\sqrt{\frac{M!}{M^{k}{k!}{\left( {M - k} \right)!}}}.{PREP}_{K}}$ is then obtained by controlling

on input state |k

_({right arrow over (b)}). Thus

$\begin{matrix} {\beta = {{{\sum\limits_{k = 0}^{K}t^{k}} \leq {\sum\limits_{k = 0}^{\infty}t^{k}}} = {\frac{1}{1 - t}.}}} & (18) \end{matrix}$ Thus by choosing t=⊖(1)≈½, s=2 and a single round of oblivious amplitude amplification suffices. Notably, even though the success probability of naive state preparation |γk|² decays rapidly, this only amounts to a constant factor slowdown compared to more sophisticated techniques that effectively prepare |s_(k)

_({right arrow over (d)}) with success probability≈1. For example, rather than rejection sampling, one may perform a reversible sort on on uniform superposition of states

${{\left. {\frac{1}{\sqrt{M^{k}}}{\sum_{\overset{\rightarrow}{m}}{\left. ❘\overset{\rightarrow}{m} \right\rangle_{d_{1}\ldots d_{k}}\left. ❘0 \right\rangle_{garbage}}}}\rightarrow{{}{{}}{{\frac{1}{\sqrt{M^{k}}}{\sum_{\overset{\rightarrow}{m}}\left. ❘{\mathcal{T}\left\lbrack {m_{d_{1}}\ldots m_{d_{k}}} \right\rbrack} \right\rangle_{d_{1}{\ldots d}_{k}}}}}} \right.❘}\left. \overset{\rightarrow}{m} \right\rangle_{garbage}},$ such as with the quanthm bitonic sorting network. This effectively increases γ_(k) ² by a factor of k!, and uses significantly more ancilla qubits, but ultimately allows us to implement time steps t≈ln2≈0.693 larger by a constant factor. □

C. Compression of Control Registers

The dominant contribution to the space overhead of the truncated Dyson series algorithm in Section III B is the K-fold duplication of registers {right arrow over (α)}, {right arrow over (d)} in the oracle DYS_(K). A general technique is now presented that when applied to DYS_(K) avoids this duplication and completes the proof of Theorem 1. Suppose one has a sequence of K unitary oracles U₁, U₂, . . . , U_(K), defined as (

0|_(α) ⊗I _(s))U _(k)(|0

_(α) ⊗I _(s))=H _(k),  (19) where H_(j) is some arbitrary matrix with bounded spectral norm ∥H_(j)∥≤1. In the general problem, one can construct a quantum circuit V that when controlled on index k in register b, applies the sequence U_(k) . . . U₂U₁, that is

$\begin{matrix} {{\left. {{{\left. {\left. {\left( {\left\langle 0❘ \right._{ac} \otimes I_{s}} \right){V\left( ❘ \right.}0} \right\rangle_{ac} \otimes I_{s}} \right) =}❘}0} \right\rangle{\left\langle 0❘ \right._{b} \otimes I_{s}}} = {\sum\limits_{k = 1}^{K}{\left. ❘k \right\rangle{\left\langle k❘ \right._{b} \otimes {\left( {\prod\limits_{j = 1}^{k}H_{j}} \right).}}}}} & (20) \end{matrix}$ Though binary control logic for this sequence is trivial when H_(k) is unitary, the complication here is that H_(k) is in general non-unitary and so the probability of successfully measuring |0

_(α) is less than one. Any other measurement outcome corresponds failure as it applies on register α an operator that is not H_(k). This complication is overcome by introducing two more registers b, c of size

(log(K)) gubits that count the number of successful measurements.

Lemma 2 (Compression gadget). Let {U_(k),k∈[K]} be a set of K unitaries that encode matrices H_(k) as defined in Eq. (19). Then there exists a quantum circuit V satisfying Eq. (20) such that the number of qubits n_(b)=

(log (K)), n_(c)=

(log (K)). The cost of V is one query to each of controlled-controlled-U_(k), and

(K (n_(α)+log(K))) additional primitive quantum gates.

Proof. Let b, c be counter registers. Using notation where commas in the subscript indicates respective assignments, these represent an n_(b,c)-bit integer l_(b,c)=Σ_(r=0) ^(n) ^(b,c) ⁻¹2^(r)q_(r) in the number state |l_(b,c)

_(b,c):=|q₀q₁ . . . q_(n) _(b,c) . . . 1

_(b,c), where q_(r) ∈{0, 1}. The size of these integers are determined by n_(b)=n_(c)+1=┌log₂(K+1)┐+1. The unitaries U_(j) will be applied conditional on the trailing bit q₀=0 in the c register, and the leading bit q_(n) _(b) ⁻¹=0 in the b register, that is CC−U _(k) :=I ^(⊗n) ^(b) ^(n) ^(c) ⁻²⊗(|0

0|_(b) _(nb) ⁻¹⊗|0

0|_(c) ₀ ⊗U _(k)+ . . . ).  (21)

Consider the circuit in FIG. 3 . There, one can apply CC-U_(k), then increment k by one, increment l_(c) by one conditional on the α register not being in the |0

_(α) state, and decrement l_(b) by one conditional on the α register being in the |0

_(α) state. This is accomplished by multiply-controlled modular addition

$\begin{matrix} {{{ADD}_{ca} = {{{{ADD}_{b}^{\dagger} \otimes I_{c} \otimes \left. ❘0 \right\rangle}\left\langle 0❘ \right._{a}} + {I_{b} \otimes {ADD}_{c} \otimes {\sum\limits_{l = 1}^{2^{n_{a}} - 1}{\left. ❘l \right\rangle\left\langle l❘ \right._{a}}}}}},{{ADD}_{b,c} = {\sum\limits_{l = 0}^{2^{n_{b,c}} - 1}{\left. ❘{l + {1{mod}2^{n_{b,c}}}} \right\rangle{\left\langle l❘ \right._{b,c}.}}}}} & (22) \end{matrix}$

As one adds integers of size

(K), each application of modular addition costs

(log(K)) primitive gates and requires

(log(K)) qubits. Implementing the multiple controls costs

(n_(α)) primitive gates and up to n_(α) extra qubits. FIG. 3 is a quantum circuit representations of the gadget V for applying a sequence of probabilistic operators H_(k) . . . H₂H₁, encoded in (

0|_(α)⊗I_(s))U_(k)(|0

_(α)⊗I_(s))=H_(k), controlled on number state |k

_(b), k∈{0,1, . . . , K}.

Restricted to input states |0

_(α)|l

_(b)|0

_(c), where l ∈{2^(n) ^(b) −1,0,1,2,3, . . . , K−1}, this implements V. For example, consider the evolution of an input state |0

_(α)|2

_(b)|0

_(c)|φ

_(s) for K=3.

$\begin{matrix} \left. {{{\left. {{\left. {{{\left. {{\left. ❘0 \right\rangle_{a}❘}1} \right\rangle_{b}\left. ❘0 \right\rangle_{c}}❘}\psi} \right\rangle_{s}\underset{{CC} - U_{1}}{\rightarrow}{\left. ❘0 \right\rangle_{a}\left. ❘1 \right\rangle_{b}}}❘0} \right\rangle_{c}H_{1}\left. ❘\psi \right\rangle_{s}} + {\left. ❘0^{\bot\text{.1}} \right\rangle_{a}\left. ❘2 \right\rangle_{b}\left. ❘0 \right\rangle_{c}\ldots}}\underset{{ADD}_{ca}}{\rightarrow}{{{\left. \left. {\left. ❘0 \right\rangle_{a}\left. ❘0 \right\rangle_{b}} \middle| 0 \right. \right\rangle_{c}H_{1}\left. ❘\psi \right\rangle_{s}} + {\left. ❘0^{\bot\text{.1}} \right\rangle_{a}\left. ❘2 \right\rangle_{b}\left. ❘1 \right\rangle_{c}\ldots}}\underset{{CC} - U_{2}}{\rightarrow}}} \middle| {{{\left. ❘0 \right\rangle_{a}\left. ❘0 \right\rangle_{b}\left. ❘0 \right\rangle_{c}H_{2}H_{1}\left. ❘\psi \right\rangle_{s}} + {\left. ❘0^{\bot\text{.2}} \right\rangle_{a}\left. ❘1 \right\rangle_{b}\left. ❘0 \right\rangle_{c}\ldots} + {\left. ❘0^{\bot\text{.1}} \right\rangle_{a}\left. ❘2 \right\rangle_{b}\left. ❘1 \right\rangle_{c}\ldots}}\underset{{ADD}_{ca}}{\rightarrow}{{\left. ❘0 \right\rangle_{a}\left. ❘{- 1} \right\rangle_{b}\left. ❘0 \right\rangle_{c}H_{2}H_{1}\left. ❘\psi \right\rangle_{s}} + {\left. ❘0^{\bot\text{.2}} \right\rangle_{a}\left. ❘1 \right\rangle_{b}\left. ❘1 \right\rangle_{c}\ldots} + {\left. ❘0^{\bot\text{.1}} \right\rangle_{a}{❘{{2_{b}\left. ❘2 \right\rangle_{c}\ldots}\underset{{CC} - U_{3}}{\rightarrow}{{{\left. ❘0 \right\rangle_{a}\left. ❘{- 1} \right\rangle_{b}\left. ❘0 \right\rangle_{c}H_{2}H_{1}\left. ❘\psi \right\rangle_{s}} + {\left. ❘0^{\bot\text{.2}} \right\rangle_{a}\left. ❘1 \right\rangle_{b}\left. ❘1 \right\rangle_{c}\ldots} + {\left. ❘0^{\bot\text{.1}} \right\rangle_{a}\left. ❘2 \right\rangle_{b}\left. ❘2 \right\rangle_{c}\ldots}}\underset{{ADD}_{ca}}{\rightarrow}{{\left. ❘0 \right\rangle_{a}\left. ❘{- 2} \right\rangle_{b}\left. ❘0 \right\rangle_{c}H_{2}H_{1}\left. ❘\psi \right\rangle_{s}} + {\left. ❘0^{\bot\text{.2}} \right\rangle_{a}\left. ❘1 \right\rangle_{b}\left. ❘2 \right\rangle_{c}\ldots} + {\left. ❘0^{\bot\text{.1}} \right\rangle_{a}\left. ❘2 \right\rangle_{b}\left. ❘3 \right\rangle_{c}\ldots}}}}}}}} \right. & (23) \end{matrix}$ In the above, negative integers correspond to the trailing bit q_(n) _(b) ⁻¹=1 so U_(k) in Eq. (21) is not applied. Note that Eq. (20) applies H_(k) . . . H₁ controlled on |k

_(b), so one can simply relabel k=l_(b)+1 mod 2^(n) ^(b) .

Proof of Theorem 1. Thus DYS_(K) may be implemented with reduced ancilla overhead through Lemma 2 provided that one finds a sequence {U_(k)} such that H_(k) . . . H₂H₁∝B_(k)—in other words,

$\begin{matrix} {{\left. {\left. {\left( {\left\langle 0❘ \right._{{ac},{others}} \otimes I_{bs}} \right){{DYS}_{K}\left( ❘ \right.}0} \right\rangle_{{ac},{others}} \otimes I_{bs}} \right) = {\sum\limits_{k = 0}^{K}{\left. ❘k \right\rangle{\left\langle k❘ \right._{b} \otimes \gamma_{k}}B_{k}}}},} & (24) \end{matrix}$ where ‘others’ represent registers with size independent of K, and γ_(k) is a scaling factor depends on the choice of U_(k). This sequence is obtained by combining three matrices. First, a unitary matrix U that prepares a uniform superposition

$\left. {{{\left. U \middle| 0 \right\rangle_{d} = {\sum\limits_{m = 0}^{M - 1}\frac{1}{\sqrt{M}}}}❘}m} \right\rangle_{d}.$ Second, the block-diagonal matrix D=Σ_(m=0) ^(M−1)|m

m|⊗H(Δm) implemented by HAM-T. Third, a strictly upper-triangular matrix G ∈

^(M×M) with elements

$\begin{matrix} {\left. {G_{ij} = {\left\{ {{\begin{matrix} {\frac{1}{M},} & {{i < j},} \\ {0,} & {{otherwise},} \end{matrix}G} = {\frac{1}{M}{\sum\limits_{i = 0}^{M - 1}\sum\limits_{j = {i + 1}}^{M - 1}}}}❘ \right.i}} \right\rangle{\left\langle j❘ \right..}} & (25) \end{matrix}$ The non-unitary triangular operator G is implemented by using an integer comparator COMP acting on registers d, e, f consisting of n_(d)=n_(e) and n_(f)=1 qubits. For any input number state index |j

_(d), let one compare j with a uniform superposition state

$\left. {{{\sum\limits_{m = 0}^{M - 1}\frac{1}{\sqrt{M}}}❘}i} \right\rangle_{e}.$ Conditional on i≥j, perform a NOT gate on register f. One can then swap registers d,e, and unprepare the uniform superposition. On input |j

_(d)|0

_(e)|0

_(f), this implements the sequence.

$\begin{matrix} \left. \left. {\left. {{{\left. {{\left. {{\left. {❘j} \right\rangle_{d}❘}0} \right\rangle_{e}❘}0} \right\rangle_{f}\underset{U{on}e}{\rightarrow}{\frac{\left. {❘j} \right\rangle_{d}}{\sqrt{M}}{\sum\limits_{i = 0}^{M - 1}\left. ❘i \right\rangle_{e}}}}❘}0} \right\rangle_{f}\underset{COMP}{\rightarrow}{\frac{\left. \left| j \right. \right\rangle_{d}}{\sqrt{M}}\sum\limits_{i = 0}^{M - 1}}} \middle| i \right\rangle_{e} \middle| {{i \geq \left. j \right\rangle_{f}}\underset{{SWAP}_{d \in}}{\rightarrow}} \right. & (26) \end{matrix}$ $\begin{matrix} {{\left. \left. \left. {\left. \left. {\frac{\left. \left| j \right. \right\rangle_{e}}{\sqrt{M}}\sum\limits_{i = 0}^{M - 1}} \middle| i \right\rangle_{d} \middle| {i \geq j} \right\rangle_{f}\underset{{U^{\dagger}}_{{on}e}}{\rightarrow}{\frac{1}{M}\sum\limits_{i = 0}^{M - 1}}} \middle| i \right\rangle_{d} \middle| 0 \right\rangle_{e} \middle| {i \geq j} \right\rangle_{f} + \ldots},} &  \end{matrix}$ where |i≥j

_(f)=|1

_(f) if i≥j and is |0

_(f) if i<j. This defines the following circuit that encodes G.

$\begin{matrix} {\left. {\left. {\left. {\left. {{{{{LT} = {\left( {I_{f} \otimes U \otimes I_{d}} \right) \cdot {COMP} \cdot \left( {I_{f} \otimes {SWAP}_{de}} \right) \cdot \left( {I_{f} \otimes U^{\dagger} \otimes I_{d}} \right)}},{\left( \left\langle 0 \right. \right.}}❘}_{ef} \otimes I_{d}} \right){{LT}\left( ❘ \right.}0} \right\rangle_{ef} \otimes I_{d}} \right) = {\frac{1}{M}{\sum\limits_{i = 0}^{M - 1}{\sum\limits_{j = {i + 1}}^{M - 1}{❘i}}}}} \right\rangle{\left\langle j❘ \right._{d}.}} & (27) \end{matrix}$ The gate complexity of G is dominated by the comparator COMP, which costs

(n_(d)) primitive quantum gates.

FIG. 5 is a schematic block diagram showing quantmn circuit representations of the unitary DYS_(K) that encodes time-ordered products of Hamiltonians, using fewer ancilla than the construction in FIG. 2 by applying the compression gadget in FIG. 4 .

One may then verify that the terms B_(k) are generated by the following sequence

$\begin{matrix} {{\left\langle {0{❘_{d}{U^{\dagger} \cdot D \cdot U}❘}0} \right\rangle_{d} = {\frac{B_{1}}{M} = {\frac{1}{M}{\sum\limits_{m_{1} = 0}^{M - 1}{H\left( {\Delta m} \right)}}}}},} & (28) \\ {{\left\langle {0{❘_{d}{U^{\dagger} \cdot \left( {D \cdot G} \right) \cdot D \cdot U}❘}0} \right\rangle_{d} = {\frac{B_{2}}{M^{2}} = {\frac{1}{M^{2}}{\sum\limits_{0 \leq m_{1} < m_{2} < M}^{}{{H\left( {\Delta m_{2}} \right)}H\left( {\Delta m_{1}} \right)}}}}},} & \\  \vdots & \\ {\left. {{\left\langle {0❘_{d}{U^{\dagger} \cdot \left( {D \cdot G} \right)^{k - 1} \cdot D \cdot U}} \right.❘}0} \right\rangle_{d} =} & \\ {\frac{B_{k}}{M^{k}} = {\frac{1}{M^{k}}{\sum\limits_{0 \leq m_{1} < m_{2} < {\ldots m_{j}} < M}^{}{{H\left( {\Delta m_{j}} \right)}\ldots{H\left( {\Delta m_{2}} \right)}\ldots{{H\left( {\Delta m_{1}} \right)}.}}}}} &  \end{matrix}$ Thus one can make the choice

$\begin{matrix} {U_{k}:=\left\{ \begin{matrix} {{\left( {U^{\dagger} \otimes I_{aefs}} \right) \cdot \left( {{HAM} - {T \otimes I_{ef}}} \right) \cdot \left( {U \otimes I_{aefs}} \right)},} & {{k = 1},} \\ {{\left( {U^{\dagger} \otimes I_{aefs}} \right) \cdot \left( {{HAM} - {T \otimes I_{ef}}} \right) \cdot \left( {{LT} \otimes I_{as}} \right) \cdot \left( {U \otimes I_{aefs}} \right)},} & {k > 1.} \end{matrix} \right.} & (29) \end{matrix}$ Combined with Lemma 2, this leads to the circuit of FIG. 5 , which implements DYS_(K) in Eq. (24) by identifying ‘others’ with the e and f registers, and scaling factor

$\gamma_{k} = {\frac{1}{M^{k}}.}$

$\begin{matrix} {{{\left( {\left\langle 0❘ \right._{acef} \otimes I_{bs}} \right){{DYS}_{K}\left( {\left. ❘0 \right\rangle_{acef} \otimes I_{bs}} \right)}} = {\sum\limits_{k = 0}^{K}{\left. ❘k \right\rangle{\left\langle k❘ \right._{b} \otimes \frac{B_{k}}{M^{k}}}}}},} & (30) \end{matrix}$

One can then select the desired linear combination of different orders in the Dyson series with the state

$\begin{matrix} {{{{COEF}_{K}\left. ❘0 \right\rangle_{b}} = {\frac{1}{\sqrt{\beta}}{\sum\limits_{k = 0}^{K}{\sqrt{\left( {- {it}} \right)^{k}}\left. ❘k \right\rangle_{b}}}}},} & (31) \end{matrix}$ $\begin{matrix} {{{{COEF}_{K}^{\prime}\left. ❘0 \right\rangle_{b}} = {\frac{1}{\sqrt{\beta}}{\sum\limits_{k = 0}^{K}{\sqrt{t^{k}}\left. ❘k \right\rangle_{b}}}}},{\beta = {{{\sum\limits_{j = 0}^{K}t^{k}} \leq {\sum\limits_{k = 0}^{\infty}t^{k}}} = {\frac{1}{1 - t}.}}}} & (32) \end{matrix}$ where one assumes that t<1 for convergence. Thus by choosing t=⊖(1)≈½, the success probability of approximating

e^(−1∫) ⁰ ^(t) ^(H(s)ds) may be boosted from β²=4to 1−

(ϵ) using a single round of oblivious amplitude amplification. □

IV. Accelerated Interaction Picture Simulation

Time-independent Hamiltonians H become time-dependent H_(I)(t) in the interaction picture. This requires the use of time-dependent Hamiltonian simulation algorithm, which scale with parameters of H_(I)(t) that differ from those for the time-independent case. For certain broad classes of Hamiltonian identified in Section IV A, these different dependencies allow us to improve the gate complexity of approximating the time-evolution operator e^(−iHt) by instead performing simulation in the interaction picture using the truncated Dyson series algorithm Theorem 1.

The interaction picture can be viewed as an intermediate between the Schrödinger and Heisenberg pictures wherein some of the dynamics is absorbed into the state and the remainder is absorbed into the dynamics of the operators. If the Hamiltonian in the Schrödinger picture is H=A+B and |φ(t)

=e^(−iHt)|φ(0)

then the Hamiltonian in the interaction picture is H_(I)(t)=e^(iAt)Be^(−iAt) and i∂_(t)|φ_(I)

(t)=H_(I)(t)|φ_(I)(t)

with |φ_(I)(t)

=e^(iAt)|φ(t)

for all t. These relations can easily be seen by substituting into the Schrödinger equation:

$\begin{matrix} {{i{\partial_{t}\left. ❘{\psi_{I}(t)} \right\rangle}} = {{i{\partial_{t}\left( {e^{iAt}\left. ❘{\psi(t)} \right\rangle} \right)}} = {{{e^{iAt}\left( {{- A} + H} \right)}\left. ❘{\psi(t)} \right\rangle} = {{e^{iAt}{Be}^{- {iAt}}e^{iAt}\left. ❘{\psi(t)} \right\rangle} = {{H_{I}(t)}{\left. ❘{\psi_{I}(t)} \right\rangle.}}}}}} & (33) \end{matrix}$ Note that if one started with time-dependent B(t), that is H(t)=A+B(t), the interaction picture Hamiltonian is H_(I)(t)=e^(iAt)B(t)e^(−iAt). The following results generalize easily to this situation, and so one can consider time-independent B for simplicity.

The advantage of this representation is a Hanmiltonian with a smaller norm ∥H(t)∥=∥B∥≤∥A∥+∥B∥, but at the price of introducing time-dependence. In general, one cannot write a closed form expression for the time-evolution operator. The following notation is commonly used to express the time evolution operator

[e^(−i∫) ⁰ ^(t) ^(H(s)ds)]=lim_(r→∞)II_(j=1) ^(r)e^(−iH(jt/r)t/r) where this product is implicitly defined to be time ordered. Given an initial state |φ(0)

, the state after evolution for t>0 may thus be written as

$\begin{matrix} {{\left. ❘{\psi(t)} \right\rangle = {{e^{- {iAt}}\left. ❘{\psi_{I}(t)} \right\rangle} = {{e^{- {iAt}}{\mathcal{T}\left\lbrack e^{{{- i}{\int_{0}^{t}{{H_{I}(s)}{ds}}}}\,} \right\rbrack}\left. ❘{\psi_{I}(0)} \right\rangle} = {{e^{- {iAt}}{\mathcal{T}\left\lbrack e^{{{- i}{\int_{0}^{t}{{H_{I}(s)}{ds}}}}\,} \right\rbrack}\left. ❘{\psi(0)} \right\rangle} = {\left( {e^{- {iAt}}{\mathcal{T}\left\lbrack e^{{{- i}{\int_{0}^{t}{{H_{I}(s)}{ds}}}}\,} \right\rbrack}} \right)^{L}\left. ❘{\psi(0)} \right\rangle}}}}},} & (34) \end{matrix}$ and evolution by the full duration is decomposed into evolution by L shorter segments of duration

.

Using the simulation algorithm Theorem 1 to simulate each segment in Eq (34) leads to the following result

Lemma 3 (Query complexity of Hamiltonian simulation in the interaction picture). For any Hamiltonian H=A+B, let HAM-T be a unitary oracle that encodes H_(I)(t)=e^(iAt)Be^(−iAt) at

$M = {\mathcal{O}\left( {\frac{t}{\epsilon}\frac{A}{\alpha_{B}}} \right)}$ uniformly spaced values of t ∈[0,

=

(α_(B) ⁻¹)].

$\begin{matrix} {{{\left( {\left\langle 0❘ \right._{a} \otimes I_{s}} \right){HAM} - {T\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = {\sum\limits_{m = 0}^{M - 1}{\left. ❘m \right\rangle{\left\langle m❘ \right._{d} \otimes \frac{e^{{iA\tau m}/M}{Be}^{{- {iA\tau m}}/M}}{\alpha_{B}}}}}},{\alpha_{B} \geq {{B}.}}} & (35) \end{matrix}$ Then e^(−iHt) may be approximated to error ϵ using

${1.{Queries}{to}{HAM} - T:{{\mathcal{O}\left( {\alpha_{B}t\frac{\log\left( {\alpha_{B}{t/\epsilon}} \right)}{\log{\log\left( {\alpha_{B}{t/\epsilon}} \right)}}} \right)}.2.}{Queries}{to}e^{{- {iA}}\tau}:{{\mathcal{O}\left( {\alpha_{B}t} \right)}.3.}{Qubits}:n_{s}} + {{{\mathcal{O}\left( {n_{a} + {\log(M)}} \right)}.4.}{Primitive}{gates}:{{\mathcal{O}\left( {\left( {n_{a} + {\log(M)}} \right)\alpha_{B}t\frac{\log\left( {\alpha_{B}{t/\epsilon}} \right)}{\log{\log\left( {\alpha_{B}{t/\epsilon}} \right)}}} \right)}.}}$ Proof. The number of segments L=

(α_(B)t) in Eq. (34) is determined by a normalization constant α_(B)≥max_(t)∥H_(I)(t)|=∥B∥, and so each segment is of size

=

(α_(B) ⁻¹). After rescaling ϵ→ϵ/L by the number of segments, the total query complexity is obtained from the truncated Dyson series algorithm Theorem 1. Using the facts max_(s) ∥H_(I)(s)∥≤∥B∥,

∥H∥

=∥[A,B]∥≤2∥A∥∥B∥, and

=

(α_(B) ⁻¹), it suffices to choose

$\begin{matrix} {M = {{\mathcal{O}\left( {\frac{\alpha_{B}t\tau^{2}}{\epsilon}\left( {\frac{\left\langle {\overset{.}{H}} \right\rangle}{\alpha_{B}} + \frac{\max_{s}{{H(s)}}^{2}}{\alpha_{B}^{2}}} \right)} \right)} = {{\mathcal{O}\left( {\frac{t}{\epsilon}\frac{A}{\alpha_{B}}} \right)}.}}} & (36) \end{matrix}$ Each segment is simulated

$\mathcal{O}\left( \frac{\log\left( {\alpha_{B}{t/\epsilon}} \right)}{\log{\log\left( {\alpha_{B}{t/\epsilon}} \right)}} \right)$ queries to HAM-T, thus simulation for the full duration has query complexity

$\mathcal{O}{\left( {\alpha_{B}t\frac{\log\left( {\alpha_{B}{t/\epsilon}} \right)}{\log{\log\left( {\alpha_{B}{t/\epsilon}} \right)}}} \right).}$

A. Comparison with Simulation of Time-Independent Hamiltonians in the Schrödinger Picture

The cost of simulation in the interaction picture using the truncated Dyson series is now compared with state-of-art simulation in the Schrödinger picture with time-independent Hamiltonians using the truncated Taylor series approach outlined in section X A. Up to logarithmic factors, this comparison is valid as the truncated Taylor series algorithm cost differs from optimal algorithms by only logarithmic factors.

For any Hamiltonian H=A+B, let us assume access to the oracles

$\begin{matrix} {{{\left( {\left\langle 0❘ \right._{a} \otimes I_{s}} \right){O_{A}\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = \frac{A}{\alpha_{A}}},{{\left( {\left\langle 0❘ \right._{a} \otimes I_{s}} \right){O_{B}\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = \frac{B}{\alpha_{B}}},} & (37) \end{matrix}$ which have gate complexity C_(A),C_(B) respectively. The gate complexity of time-independent simulation e^(−i(A+B)t) is then

$\begin{matrix} \begin{matrix} {C_{TTS} = {\mathcal{O}\left( {\left( {C_{A} + C_{B} + n_{a}} \right)\left( {\alpha_{A} + \alpha_{B}} \right)t\frac{\log\left( {\left( {\alpha_{A} + \alpha_{B}} \right){t/\epsilon}} \right)}{\log{\log\left( {\left( {\alpha_{A} + \alpha_{B}} \right){t/\epsilon}} \right)}}} \right)}} \\ {= {{\mathcal{O}\left( {\left( {C_{A} + C_{B} + n_{a}} \right)\left( {\alpha_{A} + \alpha_{B}} \right){tpoly}{\log\left( {\alpha_{A},\alpha_{B},t,\epsilon} \right)}} \right)}.}} \end{matrix} & (38) \end{matrix}$ In the interaction-picture, one can prove the following theorem

Theorem 3 (Gate complexity of Hamiltonian simulation in the interaction picture). For any Hamiltonian H=A+B, let

-   -   1. C_(B) be the gate cost of implementing the oracle

${\left( {\left\langle 0❘ \right._{a} \otimes I_{s}} \right){O_{B}\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = {\frac{B}{\alpha_{B}}.}$

-   -   2. C_(e) _(−iAt[ϵ]) =         (|t|polylog(t,ϵ)) be the gate cost of approximating e^(−iAt) to         error ϵ.         Then the total gate complexity of simulating e^(−iHt) is         C _(TDS)=         ((C _(B) +C _(e) _(iAt) [ϵ]+n _(α))α_(B)tpolylog(∥A∥,α _(B)         ,t,ϵ)).  (39)         Proof. let C_(HAM-T)|ϵ| be the gate complexity of approximating         HAM-T in Eq. (35) to error ϵ. Then from Lemma 3, the total gate         complexity of simulating e^(−iHt)=(e^(−iAt)         [e^(−i∫) ⁰ ^(t) ^(H) ^(I) ^((s)ds)])^(t/T), T=         (α_(B) ⁻¹), is

$\begin{matrix} {{\mathcal{O}\left( {\left( {{C_{{HAM} - T}\left\lbrack \frac{\alpha_{B}t}{\epsilon} \right\rbrack} + n_{a} + {\log\left( \frac{{A}t}{{\epsilon\alpha}_{B}} \right)}} \right)\left( {\alpha_{B}t\frac{\log\left( {\alpha_{B}{t/\epsilon}} \right)}{\log{\log\left( {\alpha_{B}{t/\epsilon}} \right)}}} \right)} \right)}.} & (40) \end{matrix}$ One possible decomposition of HAM-T is

$\begin{matrix} {{{HAM} - T} =} & (41) \end{matrix}$ $\begin{matrix} {\left( {\sum\limits_{m = 0}^{M - 1}{\left. ❘m \right\rangle{\left\langle m❘ \right._{d} \otimes I_{a} \otimes e^{{iA}\tau{m/M}}}}} \right) \cdot \left( {I_{d} \otimes O_{B}} \right)\text{⁠} \cdot {\left( {\sum\limits_{m = 0}^{M - 1}{\left. ❘m \right\rangle{\left\langle m❘ \right._{d} \otimes I_{a} \otimes e^{{- {iA}}\tau{m/M}}}}} \right).}} & (42) \end{matrix}$ Using C_(e) _(−iAt) [ϵ]=

(|t| polylog(t, ϵ)), synthesis of Σ_(m=0) ^(M−1)|m

m|_(d)⊗e^(iArm/M) by exponentiated power of controlled-e^(iAt/M),e^(iA2t/M),e^(iA4t/M), . . . , e^(iA2) ^(└ log) ² ^((M)┘) ^(T/M), has cost

C_(e) _(iAt2j/M) [ϵ/log (M)]=

(C_(e) _(iAt) [ϵ/log (M)]). Thus C_(HAM-T)[ϵ]=

(C_(B)+C_(e) _(iAt) [ϵ/log (M))]). □

From comparing Eqs. (38) and (39), one may iminediately state sufficient criteria for when simulation in the interaction picture is advantageous over simulation in the Schrödinger picture.

-   -   1. The upper bound on the spectral norms α_(A)≥∥A∥, α_(B)≥∥A∥ of         the encoding in Eq. (37) satisfy α_(A)»α_(B). Generally         speaking, this is correlated with term A representing fast         dynamics ∥A∥»∥B∥.     -   2. The gate complexity of time-evolution by A alone for time         =         (α_(B) ⁻¹) is comparable to that of synthesizing the oracle         O_(B), that is C_(e) _(iAα) _(B) ⁻¹ =         (C_(B)).         Note that satisfying condition (2) depends strongly on the         structure of A, B. For instance, a simulation of A for time         =         (α_(B) ⁻¹) using generic time-independent techniques has gate         complexity         (C_(A) _(α) _(A)/α_(B)). As the interest is in the case         ∥A∥/∥B∥»1, this quantity could be large and scale poorly with         the problem size. One sufficient possibility is the very strong         assumption that e^(−iAt) is cheap and can be fast-forwarded.,         such that the gate complexity C_(e) _(−iAt) [ϵ]=         (polylog(t,ϵ)) is constant up to logarithmic factor. This turns         out to be a reasonable assumption in the application to next         consider.

V. Application to the Hubbard Model with Long-Ranged Interactions

One can now apply the technology developed in Section III and Section IV for Hamiltonian simulation in the interaction picture to physical problems of practical interest. Here, the focus is on the Hubbbard model in d-dimensions with N lattice sites subject to local disorder and translational-invariant two-body couplings that may be long-ranged in general. One can perform a gate complexity comparison with simulation by time-independent techniques, and later in Section V A, this model is specialized to that of quantum chemistry simulations in the plane-wave and dual basis.

The Hubbard Hamiltonian considered has the form H=T+U+V, where T is the kinetic energy hopping operator, U is the local single-site potential, and V is a symmetric translationally-invariant two-body density coupling term between opposite spins. In the dual basis, H is expressed in terms of single-site Fermionic creation and annihilation operators {α_({right arrow over (x)}σ), α_({right arrow over (y)}σ′)}={α_({right arrow over (x)}σ) ⁵⁵⁴, α_({right arrow over (y)}σ′) ⁵⁵⁴ }=0, {α_({right arrow over (x)}σ), α_({right arrow over (y)}σ′) ⁵⁵⁴}=δ_({right arrow over (x)}{right arrow over (y)})δ_(σσ′), and the number operator n_({right arrow over (x)}σ)=α_({right arrow over (x)}σ) ⁵⁵⁴ α_({right arrow over (x)}σ). The subscript {right arrow over (x)} ∈ [−N^(1/d),N^(1/d)]^(d) indexes one of N lattice sites in d dimensions, and σ ∈ {−1,1} is a spin-½ index. Explicitly,

$\begin{matrix} {{H = {{\sum\limits_{\overset{\rightarrow}{x},\overset{\rightarrow}{y},\sigma}{{T\left( {\overset{\rightarrow}{x} - \overset{\rightarrow}{y}} \right)}a_{\overset{\rightarrow}{x}\sigma}^{\dagger}a_{\overset{\rightarrow}{y}\sigma}}} + {\sum\limits_{\overset{\rightarrow}{x},\sigma}{{U\left( {\overset{\rightarrow}{x},\sigma} \right)}n_{\overset{\rightarrow}{x}\sigma}}} + {\sum\limits_{{({\overset{\rightarrow}{x},\sigma})} \neq {({\overset{\rightarrow}{y},\sigma^{\prime}})}}{{V\left( {\overset{\rightarrow}{x} - \overset{\rightarrow}{y}} \right)}n_{\overset{\rightarrow}{x}\sigma}n_{\overset{\rightarrow}{y}\sigma^{\prime}}}}}},} & (43) \end{matrix}$ where the coefficients T({right arrow over (s)}), U({right arrow over (s)},σ), V({right arrow over (s)}) are real functions of the lattice index {right arrow over (s)} ∈ [−N^(1/d),N^(1/d)]^(d).

Further simplification of Eq. (43) is possible as the kinetic energy operator is diagonal in the plane-wave basis. This basis related to the dual basis by a unitary rotation FFFT, an acronym for ‘Fast-Fermionic-Fourier-Transform’ that implements a Fourier transform over the lattice site indices, resulting in Fermionic creation and annihilation operators c_({right arrow over (p)}σ) ⁵⁵⁴, c_({right arrow over (p)}σ).

$\begin{matrix} {{c_{\overset{\rightarrow}{p}\sigma} = {{\frac{1}{\sqrt{N}}{\sum\limits_{\overset{\rightarrow}{x}}{a_{\overset{\rightarrow}{x}\sigma}e^{i2\pi{\overset{\rightarrow}{p} \cdot {\overset{\rightarrow}{x}/N^{1/d}}}}}}} = {{FFFT}^{\dagger}a_{\overset{\rightarrow}{p}\sigma}{FFFT}}}},} & (44) \end{matrix}$ $\begin{matrix} {{c_{\overset{\rightarrow}{p}\sigma}^{\dagger} = {{\frac{1}{\sqrt{N}}{\sum\limits_{\overset{\rightarrow}{x}}{a_{\overset{\rightarrow}{x}\sigma}^{\dagger}e^{{- i}2\pi{\overset{\rightarrow}{p} \cdot {\overset{\rightarrow}{x}/N^{1/d}}}}}}} = {{FFFT}^{\dagger}a_{\overset{\rightarrow}{p}\sigma}^{\dagger}{FFFT}}}},} & (45) \end{matrix}$ By substituting the Fourier transform of the kinetic term

${{T\left( \overset{\rightarrow}{s} \right)} = {\frac{1}{N}{\sum_{\overset{\rightarrow}{p}}{{\overset{\sim}{T}\left( \overset{\rightarrow}{p} \right)}e^{{- i}2\pi{\overset{\rightarrow}{p} \cdot {\overset{\rightarrow}{s}/N^{1/d}}}}}}}},$ an equivalent expression for the Hubbard Hamiltonian is

$\begin{matrix} {{H = {{{FFFT}^{\dagger} \cdot \left( {\sum\limits_{\overset{\rightarrow}{x},\sigma}{{\overset{\sim}{T}\left( \overset{\rightarrow}{x} \right)}n_{\overset{\rightarrow}{x}\sigma}}} \right) \cdot {FFFT}} + {\sum\limits_{\overset{\rightarrow}{x},\sigma}{{U\left( {\overset{\rightarrow}{x},\sigma} \right)}n_{\overset{\rightarrow}{x}\sigma}}} + {\sum\limits_{{({\overset{\rightarrow}{x},\sigma})} \neq {({\overset{\rightarrow}{y},\sigma^{\prime}})}}{{V\left( {\overset{\rightarrow}{x} - \overset{\rightarrow}{y}} \right)}n_{\overset{\rightarrow}{x}\sigma}n_{\overset{\rightarrow}{y}\sigma^{\prime}}}}}},} & (46) \end{matrix}$ where each term is now diagonal in their respective bases.

A simulation of this Hamiltonian on a qubit quantum computer requires a map from its Fermionic operators to spin operators. One possibility is the Jordan-Wigner transformation, which requires some map from Fermionic indices to spin indices, such as

${f\left( {\overset{\rightarrow}{x},\sigma} \right)} = {{N\frac{1 - \sigma}{2}} + {\left( {\sum\limits_{j = 0}^{d - 1}{{\overset{\rightarrow}{x}}_{j}N^{j/d}}} \right).}}$ Subsequently, one can replace

$\begin{matrix} {{a_{\overset{\rightarrow}{x}\sigma}^{\dagger}\rightarrow{\frac{1}{2}{\left( {X_{f({\overset{\rightarrow}{x},\sigma})} - {iY}_{f({\overset{\rightarrow}{x},\sigma})}} \right)\underset{j = 0}{\overset{{f({\overset{\rightarrow}{x},\sigma})} - 1}{\otimes}}Z_{j}}}},{a_{\overset{\rightarrow}{x}\sigma}\rightarrow{\frac{1}{2}{\left( {X_{f({\overset{\rightarrow}{x},\sigma})} + {iY}_{f({\overset{\rightarrow}{x},\sigma})}} \right)\underset{j = 0}{\overset{{f({\overset{\rightarrow}{x},\sigma})} - 1}{\otimes}}{Z_{j}.}}}}} & (47) \end{matrix}$ Note the very useful property where number operators map to single-site spin operators n_({right arrow over (x)}σ)→½(I−Z_(∫({right arrow over (x)},σ))) (under this encoding. Moreover, FFFT, can be implemented using

(N log (N)) primitive quantum gates in the Jordan-Wigner representation.

Now evaluate the worst-case gate-complexity for time-evolution by H in the Schrödinger picture. As an example of state-of-art using the truncated Taylor series approach in Section X A, e^(−i(T+U+V)t) may be simulated using

((α_(T)+α_(U)+α_(V))t log (1/ϵ)) queries to oracles that encode T, U, and V.

$\begin{matrix} {{\left. {{{\left. {{{\left. {\left( \left\langle 0 \right.❘ \right._{a} \otimes I_{s}} \right){O_{T}\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = \frac{T}{\alpha_{T}}},{\left( \left\langle 0 \right.❘ \right._{a} \otimes I_{s}}} \right){O_{U}\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = \frac{U}{\alpha_{U}}},{\left( \left\langle 0 \right.❘ \right._{a} \otimes I_{s}}} \right){O_{V}\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = {\frac{V}{\alpha_{V}}.}} & (48) \end{matrix}$ The cost of simulation depends strongly on the coefficients {tilde over (T)} ({right arrow over (p)}), U ({right arrow over (x)}), V ({right arrow over (x)}). The most straightforward approach synthesizes these oracles using the linear-combination of unitaries outline in Eq. (6). For instance, O_(T)=(PREP_(T) ^(†)⊗FFFT^(†))·SEL_(T)·(PREP_(T)⊗FFFT), where

$\begin{matrix} {{{{PREP}_{T}\left. ❘0 \right\rangle_{a}} = {\sum\limits_{\overset{\rightarrow}{p},\sigma}{\sqrt{\frac{\overset{\sim}{T}\left( \overset{\rightarrow}{p} \right)}{\alpha_{T}}}\left. ❘{\overset{\rightarrow}{p},\sigma} \right\rangle_{a}}}},{{SEL}_{T} = {\sum\limits_{\overset{\rightarrow}{p},\sigma}{\left. ❘{\overset{\rightarrow}{p},\sigma} \right\rangle{\left\langle {\overset{\rightarrow}{p},\sigma}❘ \right._{a} \otimes n_{\overset{\rightarrow}{p}\sigma}}}}},{\alpha_{T} = {\sum\limits_{\overset{\rightarrow}{p},\sigma}{{❘{\overset{\sim}{T}\left( \overset{\rightarrow}{p} \right)}❘}.}}}} & (49) \end{matrix}$ and similarly for U and V. As there are

(N) distinct coefficients in the worst-case, each of PREP_(T,U,V) costs

(N). As V has

(N²) terms, SEL_(V) has the largest cost of

(N²). Thus overall gate complexity is

(N²(α_(T)+α_(U)+α_(V))t log (1/ϵ)). As there are

(N²) coefficients, max{α_(T), α_(U), α_(V)}=

(α_(T)+N²), and so the cost of simulation is

(N ²(α_(T) +N ²)t log(1/ϵ)).  (50)

The worst-case gate-complexity may be substantially improved by instead simulating H in the interaction picture using the truncated Dyson series algorithm in Section III.

The key idea is to simulate in the rotating frame of the interactions e^(−i(U+V)t), where the Hamiltonian becomes a time-dependent H_(I)(t)=e^(i(U+V)t) _(Te) ^(−i(U+V)t). Using the same oracle O_(T) in Eq. (48) for the kinetic term, the cost of time-evolution e^(−i(T+U+V)t) by this technique is given by Eq. (39):

$\begin{matrix} \begin{matrix} {C_{TDS} = {\mathcal{O}\left( \left( {C_{T} + {C_{e^{{i({U + V})}/\alpha_{T + n_{a}}}}\lbrack\epsilon\rbrack}} \right) \right.}} \\ \left. {}{\alpha_{T}{tpoly}{\log\left( {{{U + V}},\alpha_{T},t,\epsilon} \right)}} \right) \\ {= {\mathcal{O}\left( \left( {N + {C_{e^{{i({U + V})}/\alpha_{T}}}\lbrack\epsilon\rbrack}} \right) \right.}} \\ {\left. {}{\alpha_{T}{tpoly}{\log\left( {N,{U},{V},\alpha_{T},t,\epsilon} \right)}} \right).} \end{matrix} & (51) \end{matrix}$ All that remains is to bound the cost of time-evolution by the term C_(e) _(i(U+V)/α) _(T)[ϵ]. Using the fact that this is diagonal in the Pauli Z basis, the Hamiltonian may be fast-forwarded and so has cost that is independent of the evolution time. Thus the most straightforward approach decomposes

$\begin{matrix} {e^{{i({U + V})}t} = {\left( {\prod\limits_{\overset{\rightarrow}{x},\sigma}e^{{- {{iU}({\overset{\rightarrow}{x},\sigma})}}n_{\overset{\rightarrow}{x}\sigma}t}} \right){\left( {\prod\limits_{{({\overset{\rightarrow}{x},\sigma})} \neq {({\overset{\rightarrow}{y},\sigma})}}e^{{- {{iV}({\overset{\rightarrow}{x} - \overset{\rightarrow}{y}})}}n_{\overset{\rightarrow}{x}\sigma}n_{\overset{\rightarrow}{y}\sigma^{\prime}}t}} \right).}}} & (52) \end{matrix}$ There are

(N²) exponentials, and so C_(e) _(i(U+V)t) [ϵ]=

(N²). Compared to Eq. (50), one can already see an improvement by a factor

(N) in cases where the kinetic energy is extensive, that is α_(T)=

(N), so C_(TDS)=

(N²α_(T)tpolylog(N, α_(T), t, ϵ)).

A further improvement to C _(TDS)=

(Nα _(T) tpolylog(N,α _(T) ,t,ϵ))  (53) is possible by a more creative evaluation of the gate complexity of e^(i(U+V)t) to reduce its cost from

(N²) to

(N log (N)). Clearly, C_(e) _(iUt) =

(N) with N commuting terms poses no problem. The difficulty lies in constructing time-evolution by the two-body term e^(iVt) such that C_(e) _(iVt) =

(N log N). As V is a sum of

(N²) commuting terms, a gate cost

(N²) appear unavoidable. However, this may be reduced by exploiting the translation symmetry of its coefficients with a discrete Fourier transform. As V({right arrow over (x)})=V(−{right arrow over (x)}) is real and symmetric, its discrete Fourier transform {tilde over (V)}({right arrow over (k)})=Σ_({right arrow over (x)})V({right arrow over (x)})e^(i2π{right arrow over (x)}·{right arrow over (k)}/N) ^(1/d) only has real coefficients. Let one re-write V from Eq. (43) as

V = ∑ ( x → , σ ) ≠ ( y → , σ ′ ) V ⁡ ( x → - y → ) ⁢ n x → ⁢ σ ⁢ n y → ⁢ σ ′ = ∑ ( x → , σ ) ≠ ( y → , σ ′ ) 1 N ⁢ ∑ k → V ~ ( k → ) ⁢ e - i ⁢ 2 ⁢ π ⁡ ( x → - y → ) · k → / N ⁢ n x → ⁢ σ ⁢ n y → ⁢ σ ′ = ∑ k → V ~ ( k → ) N ⁢ ( ∑ ( x → , σ ) , ( y → , σ ′ ) e - i ⁢ 2 ⁢ π ⁡ ( x → - y → ) · k → / N ⁢ n x → , σ ⁢ n y → , σ ′ - ∑ x → , σ n x → ⁢ σ ) = ∑ k → V ~ ( k → ) ⁢ ( 1 N ⁢ ∑ x → e - i ⁢ 2 ⁢ π ⁢ x → · k → / N ⁢ ∑ σ n x → , σ ) ︸ χ ~ k ⁢ ( 1 N ⁢ ∑ y → e i ⁢ 2 ⁢ π ⁢y → · k → / N ⁢ ∑ σ ′ n y → , σ ′ ) ︸ χ ~ k † - ∑ p → , σ ( ∑ k → V ~ ( k ~ ) ) ⁢ n p → , σ . ( 54 )

The strategy for implementing e^(−iVt) is based on the following observation: Suppose one had a unitary oracle O_({right arrow over (A)})|j

|0

_(o)|0

_(garb)=|j

|A_(j)

_(o)|g(j)

_(garbage) that on input |j

∈

^(dim[{right arrow over (A)}]), outputs on the l quoit o register, the value of the j^(th) element of some complex vector {right arrow over (A)}, together with some garbage state |g(j)

_(garb) of lesser interest required to make the operation reversible. One may then perform a phase rotation that depends on A_(j) as follows:

$\begin{matrix} {{\left. ❘j \right\rangle\left. ❘0 \right\rangle_{o}\left. ❘0 \right\rangle_{garb}\left. ❘0 \right\rangle}\underset{O_{\overset{\rightarrow}{A}}}{\rightarrow}{{\left. ❘j \right\rangle\left. ❘A_{j} \right\rangle_{o}\left. ❘{g(j)} \right\rangle_{garb}\left. ❘0 \right\rangle}\underset{PHASE}{\rightarrow}{{e^{{- {iA}_{j}}t}\left. ❘j \right\rangle\left. ❘A_{j} \right\rangle_{o}\left. ❘{g(j)} \right\rangle_{garb}\left. ❘0 \right\rangle}\underset{O_{\overset{\rightarrow}{A}}^{\dagger}}{\rightarrow}{e^{{- {iA}_{j}}t}\left. ❘j \right\rangle\left. ❘0 \right\rangle_{o}\left. ❘0 \right\rangle_{garb}{\left. ❘0 \right\rangle.}}}}} & (55) \end{matrix}$ If A_(j) were represented in binary, say, A_(j)=Σ_(k=0) ^(l−1)q_(k)2^(−k), PHASE could be implemented using

(l) controlled-phase |0

0|⊗I+|1

1|⊗e^(−it2) ^(−k) ^(Z) rotations.

Thus, one can construct a unitary O_(V,binary) with the property that

$\begin{matrix} {{{{{O_{V,{binary}}\left( {\underset{\overset{\rightarrow}{x},\sigma}{\otimes}\left. ❘n_{\overset{\rightarrow}{x},\sigma} \right\rangle} \right)}\left. ❘0 \right\rangle\left. ❘0 \right\rangle_{garb}} = {\left( {\underset{\overset{\rightarrow}{x},\sigma}{\otimes}\left. ❘n_{\overset{\rightarrow}{x},\sigma} \right\rangle} \right)\left. ❘{f\left( \overset{\rightarrow}{n} \right)} \right\rangle\left. ❘{g\left( \overset{\rightarrow}{n} \right)} \right\rangle_{garb}}};}{{{f\left( \overset{\rightarrow}{n} \right)} = {\sum\limits_{{({\overset{\rightarrow}{x},\sigma})} \neq {({\overset{\rightarrow}{y},\sigma^{\prime}})}}{V\left( {\overset{\rightarrow}{x} - \overset{\rightarrow}{y}} \right)n_{\overset{\rightarrow}{p},\sigma}n_{\overset{\rightarrow}{q},\sigma^{\prime}}}}},}} & (56) \end{matrix}$ where the value ƒ({right arrow over (n)}) is encoded in l=

(log(1/ϵ)) bits. This is implemented by the following sequence, where one has omitted the garbage register for clarity.

$\begin{matrix} {{{\left( {\underset{\overset{\rightarrow}{x},\sigma}{\otimes}\left. ❘n_{\overset{\rightarrow}{x}\sigma} \right\rangle} \right)\left. ❘0 \right\rangle}\underset{ADD}{\rightarrow}{{\underset{\overset{\rightarrow}{x}}{\otimes}\left. ❘{\sum\limits_{\sigma}n_{\overset{\rightarrow}{x}\sigma}} \right\rangle}\underset{FFT}{\rightarrow}{{\underset{\overset{\rightarrow}{k}}{\otimes}\left. ❘{\overset{\sim}{\chi}}_{\overset{\rightarrow}{k}} \right\rangle}\underset{{❘ \cdot ❘}^{2}}{\rightarrow}{{\underset{\overset{\rightarrow}{k}}{\otimes}\left. ❘{❘{\overset{\sim}{\chi}}_{\overset{\rightarrow}{k}}❘}^{2} \right\rangle}\underset{\times V_{k}}{\rightarrow}{\underset{\overset{\rightarrow}{k}}{\otimes}\left. ❘{{V\left( \overset{\rightarrow}{k} \right)}{❘{\overset{\sim}{\chi}}_{\overset{\rightarrow}{k}}❘}^{2}} \right\rangle}}}}},} & (57) \end{matrix}$ The cost of O_(V,binary) may be expressed in term of the four standard reversible arithmetical operations, addition, subtraction, division, and multiplication, which each cost

(poly(l)) primitive gates. The first steps ADD adds

(N) pairs of two bits n_({right arrow over (x)},σ=1)+n_({right arrow over (x)},σ=1) and costs

(N) arithmetic operations. The second step FFT is a d-dimensional Fast-Fourier-Transform on

(N) binary numbers and requires

(N log (N)) arithmetic operations. The third step computes the absolute-value-squared of

(N) binary numbers, and uses

(N) arithmetic operations. The last step multiplies each |{tilde over (x)}_(k)|² with the corresponding V_(k), and costs

(N) arithmetic operations. This last step may actually be avoided by rescaling the time parameter e^(−iA) ^(j) ^(t)→e^(−iA) ^(j) ^(V) ^(k) ^(t) in Eq. (55). Thus the total cost of O_(V,binary) is

(N log (N) poly(l))=

(N log(1/ϵ)). Using one query to O_(V,binary), O_(V,binary) ^(†), and

(N log(1/ϵ)) primitive quantum gates, one may thus apply e^(−iVt) with a phase error

(ϵ) for a fixed value of t. A. Application to Quantum Chemistry in the Plane-Wave Basis

The Hamiltonian that generates time-evolution for a state |φ(t)

of interacting electrons in d=3 dimension consists of three operators: the electron kinetic energy T, the electron-nuclei potential energy U, and the electron-electron potential energy V. It has been demonstrated that this electronic structure Hamiltonian is a special case of the general Hubbard Hamiltonian of Eq. (46). In the plane-wave basis,

$\begin{matrix} \left. {\left. \left. {i\partial_{t}} \middle| {\psi(t)} \right. \right\rangle = \left. H \middle| {\psi(t)} \right.} \right\rangle & (58) \end{matrix}$ ${H_{P} = {{\frac{1}{2}{\sum\limits_{\overset{\rightarrow}{p},\sigma}{{❘{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{p}}❘}^{2}c_{\overset{\rightarrow}{p},\sigma}^{\dagger}c_{\overset{\rightarrow}{p},\sigma}}}} + {\frac{4\pi}{\Omega}{\underset{j,\sigma}{\sum\limits_{\overset{\rightarrow}{p} \neq \overset{\rightarrow}{q}}}{\left( {{- \zeta_{j}}\frac{e^{i{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{q} - \overset{\rightarrow}{p}}{\overset{\rightarrow}{R}}_{j}}}{{❘{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{p} - \overset{\rightarrow}{q}}❘}^{2}}} \right)c_{\overset{\rightarrow}{p},\sigma}^{\dagger}c_{\overset{\rightarrow}{q},\sigma}}}} + {\frac{2\pi}{\Omega}{\underset{\overset{\rightarrow}{v} \neq 0}{\sum\limits_{{({\overset{\rightarrow}{p},\sigma})} \neq {({\overset{\rightarrow}{q},\sigma^{\prime}})}}}\frac{c_{\overset{\rightarrow}{p},\sigma}^{\dagger}c_{\overset{\rightarrow}{q},\sigma^{\prime}}^{\dagger}c_{{\overset{\rightarrow}{q} + \overset{\rightarrow}{v}},\sigma^{\prime}}c_{{\overset{\rightarrow}{p} - \overset{\rightarrow}{v}},\sigma}}{{❘{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{v}}❘}^{2}}}}}},$ where k_({right arrow over (p)})=2π{right arrow over (p)}/Ω^(1/3), {right arrow over (p)} ∈ [−N^(1/3),N^(1/3)]³, r_({right arrow over (p)})={right arrow over (p)}(Ω/N)^(1/3), Ω represents the volume of the simulation, and n_(j) is the nuclear charge of the j^(th) nucleus. Whereas T is diagonal here, one may find an alternate basis where U and V are diagonal. This is the dual basis, defined through the unitary transform FFFT of Eq. (44). In this basis, let us define the state |φ_(D)(t)

=FFFT^(†)|φ(t)

, which evolves under the Hamiltonian H, which is of exactly that of Eq. (46), with coefficients

$\begin{matrix} {{{\overset{\sim}{T}\left( \overset{\rightarrow}{p} \right)} = {{\frac{1}{2}{❘{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{p}}❘}^{2}{U\left( \overset{\rightarrow}{p} \right)}} = {{- \frac{4\pi}{\Omega}}{\underset{j}{\sum\limits_{{\overset{\rightarrow}{v} \neq 0},}}\frac{\zeta_{j}{\cos\left\lbrack {{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{v}} \cdot \left( {{\overset{\rightarrow}{R}}_{\overset{\rightarrow}{j}} - {\overset{\rightarrow}{r}}_{\overset{\rightarrow}{p}}} \right)} \right\rbrack}}{{❘{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{v}}❘}^{2}}}}}},} & (59) \end{matrix}$ ${V\left( \overset{\rightarrow}{s} \right)} = {\frac{2\pi}{\Omega}{\sum\limits_{{\overset{\rightarrow}{v} \neq 0},}{\frac{\cos\left\lbrack {{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{v}} \cdot {\overset{\rightarrow}{r}}_{\overset{\rightarrow}{s}}} \right\rbrack}{{❘{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{v}}❘}^{2}}.}}}$

Thus the cost of time-evolution by the electronic structure Hamiltonian e^(−iHt) using the interaction picture is given by Eq. (53). The only parameter that depends on the problem is the normalization factor

$\begin{matrix} {{\alpha_{T} = {{\sum\limits_{\overset{\rightarrow}{p},\sigma}\frac{{❘{\overset{\rightarrow}{k}}_{\overset{\rightarrow}{p}}❘}^{2}}{2}} = {{\mathcal{O}\left( {\int_{0}^{N^{1/3}}{\frac{p^{2}}{\Omega^{2/3}}\left( {4\pi p^{2}} \right){dp}}} \right)} = {{\mathcal{O}\left( \frac{N^{5/3}}{\Omega^{2/3}} \right)}.}}}}\,} & (60) \end{matrix}$ Thus the total gate complexity of time-evolution under the assumption of constant density (i.e. N/Ω∈O(1)) is

$\begin{matrix} {C_{TDS} = {{\mathcal{O}\left( {\frac{N^{8/3}}{\Omega^{2/3}}t{poly}{\log\left( {N,\alpha_{T},t,\epsilon} \right)}} \right)} = {{\overset{\sim}{\mathcal{O}}\left( {N^{2}t{\log\left( {1/\epsilon} \right)}} \right)}.}}} & (61) \end{matrix}$ In contrast, the cost of simulation in the plane-wave dual basis by prior art Ryan Babbush, Nathan Wiebe, Jarrod McClean, James McClain, Hartmut Neven, and Garnet Kin Chan, “Low depth quantum simulation of electronic structure,” arXiv preprint arXiv:1706.00023, 2017, applies the ‘Qubitization’ technique (see Guang Hao Low and Isaac L Chuang, “Hamiltonian simulation by qubitization,” arXiv preprint arXiv:1610.06546, 2016) and has gate complexity that scales like Õ(N^(11/3)t), and also has a polynomial dependence on the nuclear charge n_(j). Embodiments of the disclosed method outperform this, and notably depends only poly-logarithmically on the nuclear charges within the problem.

VI. Application to Sparse Hamiltonian Simulation

In this section, a complexity-theoretic perspective of the improvements that are enabled by simulation with the truncated Dyson series and simulation in the interaction picture are submitted. One can do so by evaluating the query complexity for the simulation of sparse Hamiltonians H. Such Hamiltonians of dimension N are called d-sparse if there are at most d=

(polylog(N)) non-zero entries in every row, and the position and values of these entries may be efficiently output, in say a binary representation, by some classical circuit of size

(polylog(N)). This abstract model is useful in quantum complexity theory as a natural generalization these classical circuits leads to unitary quantum oracles that can be queried to access the same information, but now in superposition. With this model, one can achieve inSection VIA time-dependent simulation with a square-root improvement with respect to the sparsity parameter, and gate complexity scaling with the average instead of worst-case rate-of-change ∥H∥. By moving to the interaction picture in Section VI B, one can find more efficient time-independent simulation algorithms for diagonally-dominant Hamiltonians.

This model assumes that the Hamiltonian is input to the simulation routine through two oracles: O_(f) and O_(H). O_(H) is straight forward; it provides the values of the matrix elements of the Hamiltonian given a time index |t

and indices |x

, |y

to for the row and column of H as follows O _(H) |t,x,y,0)=|t,x,y,H _(xy)(t)

  (62) O_(f) provides the locations of the non-zero matrix elements in any given row or column of H. Specifically, let ƒ(x,j) give the column index of the j^(th) non-zero matrix element in row x if it exists and an appropriately chosen zero element if it does not. In particular, let r_(t,j) be the list of column indices of these non-zero matrix elements in row J. One can then define, with a time-index t, O _(ƒ) |t,x,j

=|t,x,ƒ _(t)(x,j)

.  (63)

A. Simulation of Sparse Time-Dependent Hamiltonians

Applying the truncated Dyson series simulation algorithm to sparse Hamiltonians requires us to synthesize HAM-T in Eq. (7) from these oracles O_(H), O_(F). This is possible by a straightforward construction.

Lemma 4 (Synthesis of HAM-T from sparse Hamiltonian oracles). Let an N×N time-dependent d-sparse Hamiltonian H(s) with max-norm ∥H∥_(max):=max_(s)∥H(s)∥_(max) be defined on the interval s ∈[0, t] to n_(p) bits of precision. Then

$\begin{matrix} \left( {{{\left\langle {0❘_{a}{\otimes I_{s}}} \right){HAM} - {T\left( {\left. ❘0 \right\rangle_{a} \otimes I_{s}} \right)}} = {\sum\limits_{t}{\left. ❘t \right\rangle{\left\langle t❘ \right._{d} \otimes \frac{H(t)}{d{H}_{m{ax}}}}}}},} \right. & (64) \end{matrix}$ can be implemented with O(1) queries to O_(f) and O_(H), and

(poly(n_(p))+log (N)) primitive gates.

Proof. Let U_(col), U_(row) be the following unitary transformations

$\begin{matrix} {{{U_{col}\left. ❘t \right\rangle_{d}\left. ❘k \right\rangle_{s}\left. ❘0 \right\rangle_{a}}:={{\left. ❘t \right\rangle_{d}\left. ❘{\chi_{k}(t)} \right\rangle} = {\frac{1}{\sqrt{d}}{\sum\limits_{p \in r_{k}}{\left. ❘t \right\rangle_{d}\left. ❘k \right\rangle_{s}\left. ❘p \right\rangle_{a_{1}}\left( {{\sqrt{\frac{H_{k,p}^{*}(t)}{{H}_{{ma}x}}}\left. ❘0 \right\rangle_{a_{2}}} + {\sqrt{1 - \frac{❘{H_{k,p}(t)}❘}{{H}_{{ma}x}}}\left. ❘1 \right\rangle_{a_{2}}}} \right)}}}}},} & (65) \end{matrix}$ $\begin{matrix} {{{\left\langle 0❘ \right._{a}\left\langle j❘ \right._{s}\left\langle t❘ \right._{d}U_{row}^{\dagger}}:={{\left\langle {{\overset{\_}{\chi}}_{j}(t)}❘ \right.\left\langle t❘ \right._{d}} = {\frac{1}{\sqrt{d}}{\sum\limits_{q \in r_{j}}{\left( {{\sqrt{\frac{{\delta_{j,q}{H_{j,q}(t)}} + {\left( {1 - \delta_{j,q}} \right){H_{j,q}(t)}}}{{H}_{{ma}x}}}\left\langle 0❘ \right._{a_{2}}} + {\sqrt{1 - \frac{❘{H_{q,j}(t)}❘}{{H}_{{ma}x}}}\left\langle 2❘ \right._{a_{2}}}} \right)\left\langle j❘ \right._{a_{1}}\left\langle q❘ \right._{s}\left\langle t❘ \right._{d}}}}}},} & (66) \end{matrix}$ $\begin{matrix} {\left\langle {{\overset{\_}{\chi}}_{j}(t)} \middle| {\chi_{k}(t)} \right\rangle = {\frac{\sqrt{{H_{j,k}(t)}{H_{k,j}^{*}(t)}}}{d{H}_{{ma}x}} = {\frac{H_{j,k}}{d{H}_{{ma}x}}.}}} & (67) \end{matrix}$ Let |φ

=Σ_(t,k)α_(t,k)|t

_(d)|k)_(s). One then has that

$\begin{matrix} \begin{matrix} {{\left\lbrack {\left. ❘0 \right\rangle{\left\langle 0❘ \right._{a} \otimes}} \right\rbrack{U_{row}^{\dagger} \cdot U_{col}}\left. ❘t \right\rangle_{d}\left. ❘\psi \right\rangle\left. ❘0 \right\rangle_{a}} = {\left. ❘0 \right\rangle{\left\langle 0❘ \right._{a} \otimes {\sum\limits_{t^{\prime},j}\left( {\left. ❘t^{\prime} \right\rangle{\left\langle t^{\prime}❘ \right._{d} \otimes \left. ❘j \right\rangle}\left\langle j❘ \right._{s}} \right)}}}} \\ {\sum\limits_{t,k}{a_{t,k}{U_{row}^{\dagger} \cdot U_{col}}\left. ❘t \right\rangle_{d}\left. ❘k \right\rangle_{s}{\left. ❘0 \right\rangle_{a}.}}} \\ \left. \left. {{{= {\left. ❘0 \right\rangle_{a}{\sum\limits_{t^{\prime},j}{\left. ❘t^{\prime} \right\rangle_{d}\left. ❘j \right\rangle_{s}{\sum\limits_{t,k}{a_{t,k}\left( \left\langle {{{\overset{\_}{\chi}}_{j}\left( t^{\prime} \right)}\left. ❘\left\langle t^{\prime}❘ \right._{d} \right)\left( \left. ❘t \right\rangle_{d} \right.} \right. \right.}}}}}}❘}{\chi_{k}(t)}} \right\rangle \right) \\ {= {{\sum\limits_{j,k}{a_{t,k}\left. ❘0 \right\rangle_{a}\left. ❘t \right\rangle_{d}\left. ❘j \right\rangle_{s}\frac{H_{j,k}(t)}{d{H}_{{ma}x}}}} = {\frac{\left. ❘0 \right\rangle_{s}{H(t)}\left. ❘\psi \right\rangle}{d{H}_{{ma}x}}.}}} \end{matrix} & (68) \end{matrix}$ As this result holds for any input state |φ

, the choice HAM-T=U_(row) ^(†)·U_(col) satisfies Eq. (64).

The query cost then follows from the fact that U_(col) and U_(row) ^(†)can be implemented using O(1) calls to O_(ƒ). In particular, U_(col) can be prepared in the following steps:

$\begin{matrix} {\left. {\left. ❘t \right\rangle\left. ❘k \right\rangle\left. ❘0 \right\rangle}\mapsto{\left. ❘t \right\rangle\left. ❘k \right\rangle\frac{1}{\sqrt{d}}{\sum\limits_{\ell = 1}^{d}{\left. ❘\ell \right\rangle\left. ❘0 \right\rangle}}} \right.{\left. ❘t \right\rangle\left. ❘k \right\rangle\frac{1}{\sqrt{d}}{\sum\limits_{p \in r_{k}}{\left. ❘p \right\rangle\left. ❘0 \right\rangle}}}{\left. ❘t \right\rangle\left. ❘k \right\rangle\frac{1}{\sqrt{d}}{\sum\limits_{p \in r_{k}}{\left. ❘p \right\rangle\left. ❘{H_{k,p}(t)} \right\rangle\left. ❘0 \right\rangle}}}\left. \mapsto{\left. ❘t \right\rangle\left. ❘k \right\rangle\frac{1}{\sqrt{d}}{\sum\limits_{p \in r_{k}}{\left. ❘p \right\rangle\left. ❘{H_{k,p}(t)} \right\rangle\left( {{\sqrt{\frac{H_{k,p}^{*}(t)}{{H}_{{ma}x}}}\left. ❘0 \right\rangle} + {\sqrt{1 - \frac{❘{H_{k,p}(t)}❘}{{H}_{{ma}x}}}\left. ❘1 \right\rangle}} \right)}}} \right.{{\left. ❘t \right\rangle\left. ❘k \right\rangle\frac{1}{\sqrt{d}}{\sum\limits_{p \in r_{k}}{\left. ❘p \right\rangle\left( {{\sqrt{\frac{H_{k,p}^{*}(t)}{{H}_{{ma}x}}}\left. ❘0 \right\rangle} + {\sqrt{1 - \frac{❘{H_{k,p}(t)}❘}{{H}_{{ma}x}}}\left. ❘1 \right\rangle}} \right)\left. ❘0 \right\rangle}}} = {U_{col}\left. ❘t \right\rangle\left. ❘k \right\rangle{\left. ❘0 \right\rangle.}}}} & (69) \end{matrix}$ Therefore accessing U_(col) unitaries requires O(1) queries to the fundamental oracles as claimed, along with a arithmetic circuit, of size polynomial in the number of bits used to represent |H_(k,p)(t)

, for computing trigonometric functions of the magnitudes of the complex-valued matrix elements as well as their arguments. The argument that U_(row) ^(†)requires O(1) queries follows in exactly the same manner, but with an additional final step that swaps the s and α₁ registers.□

Once HAM-T, is obtained, the complexity of simulation follows directly from previous results.

Theorem 4 (Simulation of sparse time-dependent Hamiltonians). Let an N×N time-dependent Hamiltonian H(s) for s ∈ [0, t] with max-norm, ∥H∥_(max):=max_(s)∥H(s)∥_(max) and average rate-of-change

${\left\langle {\overset{.}{H}} \right\rangle = {\frac{1}{t}{\int_{0}^{t}{\frac{{dH}(s)}{ds}}}}}\,$ ds be encoded in the oracles O_(H) and O_(ƒ) from Eqs. (62) and (63) to n_(p) bits of precision. Let α=d∥H∥_(max) and

=tα. Then the time-ordered evolution, operator

[exp(−iƒ₀ ^(t)H(s)ds)] may be approximated with error ϵ using

$1.{Queries}{to}O_{H}{and}O_{f}:\mathcal{O}{\left( {\tau\frac{\log\left( {\tau/\epsilon} \right)}{\log{\log\left( {\tau/\epsilon} \right)}}} \right).}$ $2.{Qubits}:{{\mathcal{O}\left( {n_{p} + {\log\left( {\frac{N_{\tau}}{\epsilon\alpha^{2}}\left( {\frac{\left\langle {\overset{.}{H}} \right\rangle}{\alpha} + \frac{{H}^{2}}{\alpha^{2}}} \right)} \right)}} \right)}.}$ $3.{Primitive}{gates}:{{\mathcal{O}\left( {{{poly}\left( n_{p} \right)} + {{\log\left( {\frac{N_{\tau}}{\epsilon\alpha^{2}}\left( {\frac{\left\langle {\overset{.}{H}} \right\rangle}{\alpha} + \frac{{H}^{2}}{\alpha^{2}}} \right)} \right)}\frac{\log\left( {\tau/\epsilon} \right)}{\log{\log\left( {\tau/\epsilon} \right)}}}} \right)}.}$ Proof. From Theorem 1, the number of qubits requried is

(n_(s)+n_(a)+n_(d)+log log (1/ϵ)). Values for these parameters are obtained from the construction of HAM-T in Lemma 4, which also requires an additional n_(p) qubits for the bits of precision to which H is encoded. In this construction, n_(α)=

(n_(s))=

(log (N)). n_(d) is obtained from the number of time-discretization points required by Theorem 1. Simulation for time t is implemented by simulating segments of duration ⊖(α⁻¹). As there are ⊖(tα) segments, one can rescale ϵ→⊖(ϵ/(tα)).□

This is a quadratic improvement in sparsity d. Furthermore, instead of scaling with the worst-case

(log (max_(s)∥{dot over (H)}(s)∥)), one can obtain scaling with average rate-of-change

∥{dot over (H)}∥

. If one further assumes that the computed matrix elements H_(j,k) are not exact, the number of bit of precision scales like n_(p)=

(log (∥H∥t/ϵ)). Note several generic improvements to Theorem 4 are possible, but will not be pursued further as they are straightforward. For instance, if ∥H(t)∥_(max) as a function of time is known, one may use step sizes of varying size by encoding each segment t ∈[t_(j),t_(j+1)] with the largest max_(t∈[t) _(j) _(,t) _(j+) 1_(])∥H(t)∥_(max), rather than the worst-case max_(t)∥H(t)∥_(max).

B. Simulation of Sparse Time-Independent Hamiltonians in the Interaction Picture

This section concerns time-independent d-sparse Hamiltonians H=A+B where A is diagonal and B_(kk)=0 for all k. In particular, consider the case of diagonally dominant Hamiltonians, where ∥A∥≥d∥B∥_(max). Given norms for each of these terms ∥A∥ and ∥B∥_(max), it is straightforward to simulate time-evolution e^(−iHt) in the Schrödinger picture. For instance, using the truncated Taylor series approach in Eq. (38), one obtains a query complexity of

(t(d∥B∥_(max)+∥A∥) polylog(t,d,∥A∥,∥B∥_(max), ϵ)). By instead simulating H_(I)(t)=e^(iAt)Be^(−iAt) in the interaction picture, the dependence on ∥A∥ can be removed, which is particularly useful in cases of strong diagonal dominance ∥A∥≥d∥B∥_(max), of which the Hubbard model with long-ranged interactions in Section V is an example. Similar to the disclosed results for time-dependent sparse Hamiltonian simulation in Section VIA, this is easily proven by mapping the input oracles O_(H) and O_(ƒ)for matrix values and positions to the oracles of Theorem 3 for the more general result.

Theorem 5 (Simulation of sparse diagonally dominant Hamiltonians). Let an N×N time-independent Hamiltonian H for s ∈[0,t] with max-norm ∥A∥ for the diagonal component and max-norm ∥B∥_(max) for the off-diagonal component be encoded in the oracles O_(H) and O_(ƒ) from Eqs. (62) and (63) to n_(p) bits of precision. Let α_(b)=d∥B∥_(max). Then the time-evolution operator e^(−iHt) may be approximated with error ϵ using

$1.{Queries}{to}O_{H}{and}O_{f}:\mathcal{O}{\left( {\alpha_{B}t\frac{\log\left( {\alpha_{B}{\tau/\epsilon}} \right)}{\log{\log\left( {\alpha_{B}{\tau/\epsilon}} \right)}}} \right).}$ $2.{Qubits}:{{\mathcal{O}\left( {n_{p} + {\log(N)} + {\log\left( {\frac{t}{\epsilon}\frac{A}{\alpha_{B}}} \right)}} \right)}.}$ $3.{Primitive}{gates}:{\mathcal{O}\left( {\left( {{\log(N)} + {{{poly}\left( n_{p} \right)}{\log\left( {\frac{t}{\epsilon}\frac{A}{\alpha_{B}}} \right)}}} \right)\alpha_{B}t\frac{\log\left( {\alpha_{B}{t/\epsilon}} \right)}{\log{\log\left( {\alpha_{B}{t/\epsilon}} \right)}}} \right)}{primitive}{{gates}.}$ Proof. This follows immediately by combining the query complexity of Lemma 3 to e^(−iAt) and HAM-T that encodes the Hamiltonian H_(I)(t)=e^(iAt)Be^(−iAt) in the rotating frame, with the query complexity of the approach in Theorem 3 for synthesizing these oracles using the input oracles O_(H) and O_(ƒ). One possible decomposition of HAM-T is

$\begin{matrix} {{{{HAM} - T} = {\left( {\sum\limits_{m = 0}^{M - 1}{\left. ❘m \right\rangle{\left\langle m❘ \right._{d} \otimes I_{a} \otimes e^{{iArm}/M}}}} \right)\left( {I_{d} \otimes O_{B}} \right)\left( {\sum\limits_{m = 0}^{M - 1}{\left. ❘m \right\rangle{\left\langle m❘ \right._{d} \otimes I_{a} \otimes e^{{- {iArm}}/M}}}} \right)}},{{\left( {\left\langle 0❘ \right._{a} \otimes I_{s}} \right){O_{B}\left( {\left\langle 0❘ \right._{a} \otimes I_{s}} \right)}} = \frac{B}{\alpha_{B}}},} & (70) \end{matrix}$ where α_(B)=d∥B∥_(max),

=

(α_(B) ⁻¹) and

$M = {{\mathcal{O}\left( {\frac{t}{\epsilon}\frac{A}{\alpha_{B}}} \right)}.}$ Note that with this construction,

${{\left. {\left. {\left. {{{\left( {\left\langle 0❘ \right._{a} \otimes I_{s}} \right){HAM} - {T(}}❘}0} \right\rangle_{a} \otimes I_{s}} \right) = {\sum\limits_{m = 0}^{M - 1}{❘m}}} \right\rangle\left\langle m \right.}❘}_{d} \otimes {\frac{H_{I}\left( {\tau{m/M}} \right)}{d{B}_{m{ax}}}.}$

First, synthesize (Σ_(m=0) ^(M−1)∥m

m|_(d)⊗e^(iAτm/M)) using

(1) queries to the input oracles O_(H), and

(log(N)+n_(p) log(M)) primitive gates. Since A is diagonal, e^(−iAt) can be simulated for any t>0 using only two queries. This is implemented by the following steps:

$\begin{matrix} {\left. {\left. ❘k \right\rangle\left. ❘0 \right\rangle}\mapsto{\left. ❘k \right\rangle\left. ❘k \right\rangle\left. ❘0 \right\rangle}\underset{O_{H}}{\mapsto}{\left. ❘k \right\rangle\left. ❘k \right\rangle\left. ❘H_{k,k} \right\rangle\left. ❘0 \right\rangle}\mapsto{\left. ❘k \right\rangle\left. ❘k \right\rangle e^{{- {iH}_{k,k}}{Zt}}\left. ❘0 \right\rangle} \right. = {{e^{{- {iH}_{k,k}}t}\left. ❘k \right\rangle\left. ❘k \right\rangle\left. ❘0 \right\rangle}\underset{O_{H}^{\dagger}}{\mapsto}\left. {e^{{- {iH}_{k,k}}t}\left. ❘k \right\rangle\left. ❘k \right\rangle\left. ❘0 \right\rangle}\rightarrow{e^{{- {iH}_{k,k}}t}\left. ❘k \right\rangle{\left. ❘0 \right\rangle.}} \right.}} & (71) \end{matrix}$

Step one uses n_(s)=

(log (N)) CNOT gates to copy the computational basis state |k

. Step three applies

(n_(p)) phase rotation with angle controlled by the bits of |H_(k,k)

, and the value of t, which is given beforehand. Subsequently, (Σ_(m=0) ^(M−1)|m

m|_(d)⊗I_(α)⊗e^(iAtm/M)) may be implemented by a sequence of rotations with angles increasing in a geometric series, and each controlled by a different qubit in the d register, e.g. controlled−e^(iAτ2) ⁻¹ ^(/M),e^(iAτ2) ⁻² ^(/M),e^(iAτ2) ^(−n+p) ^(/M). Naively, this requires

(log (M)) queries. However, it is only necessary to compute |H_(k,k)

once as the entire sequence of controlled-phases may be applied after step three. Similarly, it is only necessary to copy the computational basis state

(1) times.

Second, synthesize O_(B) using

(1) queries to the input oracles O_(H) and O_(ƒ). How this is done should be clear from Lemma 4, by omitting the time-index, and preparing the state 0|0

_(α) ₂ +1|1

_(α) ₂ in Eq. (65) when the input indices k=p. This has gate complexity

(poly(n_(p))+log (N)). Thus e^(−iAτ) and HAM-T combined have query complexity

(1) to O_(H) and O_(ƒ), and gate complexity

(poly(n_(p)) log (M)+log (N)). By substituting into Lemma 3, one can obtain the stated results.□

This provides a formal proof that the query complexity of simulating a Hamiltonian, within the interaction picture, is independent of the magnitude of the diagonal elements of the Hamiltonian.

VII. General Embodiments

In this section, example methods for performing aspects of the disclosed embodiments are disclosed. The particular embodiments described should not be construed as limiting, as the disclosed method acts can be performed alone, in different orders, or at least partially simultaneously with one another. Further, any of the disclosed methods or method acts can be performed with any other methods or method acts disclosed herein.

FIG. 6 is an example method for a time-dependent simulation algorithm as disclosed herein.

In FIG. 6 , the inputs are shown at 610. The inputs include quantum state subroutine (S); for computing (H(t)), evolution time (t), number of segments (r), order (K). At 612, a determination is made as to whether any segments remain. If so, the process proceeds to 614, where a robust oblivious amplitude amplication procedure is applied. If the procedure of 614 is done, then the process proceeds to the next segment; if the procedure 614 is not done, then the procedure proceeds to 616. At 616, a weighted quantum superposition is prepared over the first K terms in the Dyson series of time evolution operator. At 618, a quantum superposition is prepared over each of the times that the Hamiltonian is evaluated at. At 620, subroutine (S) is applied to construct a state that stores the columns of a rescaling of the kth term in the dyson series, for each k=0 . . . K. At 622, subroutine (S) is applied to construct a state that stores the rows of a rescaling of the kth term in the dyson series, for each k=0 . . . , K. At 624, linear combinations of unitary methods are used to conditionally implement the evolution to the input state of 614.

FIG. 7 is an example method for an example interaction picture simulation method as disclosed herein.

At 710, a quantum state, Hamiltonian function, and evolution time are input. At 712, a classical computer is used to construct an interaction picture Hamiltonian by conjugating each term in the off-diagonal Hamiltonian by the evolution according to the diagonal terms. At 714, the interaction picture version of the Hamiltonian is simulated on a quantum computer using a time-dependent Hamiltonian simulation method (e.g., such as that of FIG. 1 ). At 716, the interaction picture is transformed back on the quantum computer by counter-rotating the quantum state. At 718, the resulting quantum state is output.

FIG. 8 is an example method simulating chemistry and Hubbard models as disclosed herein. At 810, a quantum state, Hamiltonian function, and an evolution time are input. At 182, a classical computer is used to transform to a frame where a potential operator is diagonal. At 814, a simulation method (e.g., the method of FIG. 2 ) is applied to a quantum state by the classical computer. And, at 816, an evolved quantum state is output (e.g., by the classical computer).

FIG. 9 is an example method of a compression method as disclosed herein. At 910, numerous values are input, including one or more of (a) a classical variable j=1; (b) a success quantum register A; (c) target quantum register S; (d) counter quantum register B; (e) counter quantum register C; (f) a set of J quantum circuits Q₁,Q₂,Q_(j), where Q_(j) applies matrix M_(j) on register S conditional register A being in a state that indicates ‘success. At 912, it is determined whether j=j+1. If not, then the method proceeds to 914. At 914, and conditional on B and C being the initial quantum state, Q_(j) is applied. Then, at 916, j is incremented by 1, and conditional on the A being in a ‘success quantum state, B is decremented and C is incremeneted with quantum arithmetic. 916 leads back into 912, until it is determined that j=j+1. If so, at 918, the method outputs a quantum circuit that applies on register S the product of matrices M₁ . . . M_(k) conditional on register A and C being in the initial quantum state, and register b being in the quantum state k.

FIG. 14 illustrates an example method for performing a quantum simulation in accordance with an embodiment of the disclosed technology.

At 1410, a quantum computer is configured to simulate a quantum system, wherein a Hamiltonian in the simulation is represented in the interaction picture. At 1412, a simulation of the quantum system is performed using the quantum computer.

In certain implementations, the simulation of the quantum system is a subroutine that is repeated two or more times. In some implementations, the simulation is performed using linear combinations of matrices. For instance, in some cases, the simulation uses linear combinations of unitaries performed on a diagonally dominant matrix (e.g., using linear combinations of unitaries performed on the diagonally dominant components of the diagonally dominant matrix). In certain implementations, the quantum system is modelled by a Hubbard model. In particular implementations, the quantum system describes a physical chemical system or molecule. In some implementations, the Hamiltonian is sparse and the simulation uses a state of an auxillary qubit to encode matrix elements of the Hamiltonian instead of using graph decomposition techniques. In certain implementations, the simulation is performed by compressing ancillas for quantum simulation of a time-dependent Hamiltonian.

FIG. 15 illustrates a further example method for performing a quantum simulation in accordance with an embodiment of the disclosed technology.

At 1510, a quantum algorithm is implemented on a quantum computer for simulating a general sparse time-dependent quantum system. In this embodiment, the quantum algorithm does not use graph decomposition techniques.

In certain implementations, a Hamiltonian used in the simulation is represented in the interaction picture. In some implementations, the simulation uses linear combinations of unitaries performed on a diagonally dominant matrix. In particular implementations, a Hamilltonian used in the simulation is represented in the interaction picture and is chosen to be a diagonal matrix. In certain implementations the quantum system is modelled by a Hubbard model, a physical chemical system, or a molecule. In further implementations, the simulation includes compressing ancillas used to index a time for the simulation.

VIII. Example Computing Environments

FIG. 10 illustrates a generalized example of a suitable classical computing environment 1000 in which aspects of the described embodiments can be implemented. The computing environment 1000 is not intended to suggest any limitation as to the scope of use or functionality of the disclosed technology, as the techniques and tools described herein can be implemented in diverse general-purpose or special-purpose environments that have computing hardware.

With reference to FIG. 10 , the computing environment 1000 includes at least one processing device 1010 and memory 1020. In FIG. 2 , this most basic configuration 1030 is included within a dashed line. The processing device 1010 (e.g., a CPU or microprocessor) executes computer-executable instructions. In a multi-processing system, multiple processing devices execute computer-executable instructions to increase processing power. The memory 1020 may be volatile memory (e.g., registers, cache, RAM, DRAM, SRAM), non-volatile memory (e.g., ROM, EEPROM, flash memory), or some combination of the two. The memory 1020 stores software 1080 implementing tools for peforming any of the disclosed simulation techniques in the quantum computer as described herein. The memory 1020 can also store software 1080 for synthesizing, generating, or compiling quantum circuits for performing the described simulation techniques using quantum computing devices as described herein.

The computing environment can have additional features. For example, the computing environment 1000 includes storage 1040, one or more input devices 1050, one or more output devices 1060, and one or more communication connections 1070. An interconnection mechanism (not shown), such as a bus, controller, or network, interconnects the components of the computing environment 1000. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 1000, and coordinates activities of the components of the computing environment 1000.

The storage 1040 can be removable or non-removable, and includes one or more magnetic disks (e.g., hard drives), solid state drives (e.g. flash drives), magnetic tapes or cassettes, CD-ROMs, DVDs, or any other tangible non-volatile storage medium which can be used to store information and which can be accessed within the computing environment 1000. The storage 1040 can also store instructions for the software 1080 implementing any of the disclosed simulation techniques in a quantum computing device. The storage 1040 can also store instructions for the software 1080 for generating and/or synthesizing any of the described techniques, systems, or quantum circuits.

The input device(s) 1050 can be a touch input device such as a keyboard, touchscreen, mouse, pen, trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 1000. The output device(s) 1060 can be a display device (e.g., a computer monitor, laptop display, smartphone display, tablet display, netbook display, or touchscreen), printer, speaker, or another device that provides output from the computing environment 1000.

The communication connection(s) 1070 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.

As noted, the various methods, techniques for controlling a quantum computing device, circuit design techniques, or compilation/synthesis techniques can be described in the general context of computer-readable instructions stored on one or more computer-readable media. Computer-readable media are any available media (e.g., memory or storage device) that can be accessed within or by a computing environment. Computer-readable media include tangible computer-readable memory or storage devices, such as memory 1020 and/or storage 1040, and do not include propagating carrier waves or signals per se (tangible computer-readable memory or storage devices do not include propagating carrier waves or signals per se).

Various embodiments of the methods disclosed herein can also be described in the general context of computer-executable instructions (such as those included in program modules) being executed in a computing environment by a processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, and so on, that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.

An example of a possible network topology 1100 (e.g., a client-server network) for implementing a system according to the disclosed technology is depicted in FIG. 11 . Networked computing device 1120 can be, for example, a computer running a browser or other software connected to a network 1112. The computing device 1120 can have a computer architecture as shown in FIG. 10 and discussed above. The computing device 1120 is not limited to a traditional personal computer but can comprise other computing hardware configured to connect to and communicate with a network 1112 (e.g., smart phones, laptop computers, tablet computers, or other mobile computing devices, servers, network devices, dedicated devices, and the like). Further, the computing device 1120 can comprise an FPGA or other programmable logic device. In the illustrated embodiment, the computing device 1120 is configured to communicate with a computing device 1130 (e.g., a remote server, such as a server in a cloud computing environment) via a network 1112. In the illustrated embodiment, the computing device 1120 is configured to transmit input data to the computing device 1130, and the computing device 1130 is configured to implement a technique for controlling a quantum computing device according to any of the disclosed embodiments and/or a circuit generation/compilation/synthesis technique for generating quantum circuits for performing any of the techniques disclosed herein. The computing device 1130 can output results to the computing device 1120. Any of the data received from the computing device 1130 can be stored or displayed on the computing device 1120 (e.g., displayed as data on a graphical user interface or web page at the computing devices 1120). In the illustrated embodiment, the illustrated network 1112 can be implemented as a Local Area Network (LAN) using wired networking (e.g., the Ethernet IEEE standard 802.3 or other appropriate standard) or wireless networking (e.g. one of the IEEE standards 802.11a, 802.11b, 802.11g, or 802.11n or other appropriate standard). Alternatively, at least part of the network 1112 can be the Internet or a similar public network and operate using an appropriate protocol (e.g., the HTTP protocol).

Another example of a possible network topology 1200 (e.g., a distributed computing environment) for implementing a system according to the disclosed technology is depicted in FIG. 12 . Networked computing device 1220 can be, for example, a computer running a browser or other software connected to a network 1212. The computing device 1220 can have a computer architecture as shown in FIG. 12 and discussed above. In the illustrated embodiment, the computing device 1220 is configured to communicate with multiple computing devices 1230, 1231, 1232 (e.g., remote servers or other distributed computing devices, such as one or more servers in a cloud computing environment) via the network 1212. In the illustrated embodiment, each of the computing devices 1230, 1231, 1232 in the computing environment 1200 is used to perform at least a portion of a technique for controlling a quantum computing device according to any of the disclosed embodiments and/or a circuit generation/compilation/synthesis technique for generating quantum circuits for performing any of the techniques disclosed herein. In other words, the computing devices 1230, 1231, 1232 form a distributed computing environment in which aspects of the techniques as disclosed herein and/or quantum circuit generation/compilation/synthesis processes are shared across multiple computing devices. The computing device 1220 is configured to transmit input data to the computing devices 1230, 1231, 1232, which are configured to distributively implement such as process, including performance of any of the disclosed methods or creation of any of the disclosed circuits, and to provide results to the computing device 1220. Any of the data received from the computing devices 1230, 1231, 1232 can be stored or displayed on the computing device 1220 (e.g., displayed as data on a graphical user interface or web page at the computing devices 1220). The illustrated network 1212 can be any of the networks discussed above with respect to FIG. 11 .

With reference to FIG. 13 , an exemplary system for implementing the disclosed technology includes computing environment 1300. In computing environment 1300, a compiled quantum computer circuit description (including quantum circuits configured to perform any of the disclosed simulation techniques as disclosed herein) can be used to program (or configure) one or more quantum processing units such that the quantum processing unit(s) implement the circuit described by the quantum computer circuit description.

The environment 1300 includes one or more quantum processing units 1302 and one or more readout device(s) 1308. The quantum processing unit(s) execute quantum circuits that are precompiled and described by the quantum computer circuit description. The quantum processing unit(s) can be one or more of, but not limited to: (a) a superconducting quantum computer; (b) an ion trap quantum computer; (c) a fault-tolerant architecture for quantum computing; and/or (d) a topological quantum architecture (e.g., a topological quantum computing device using Majorana zero modes). The precompiled quantum circuits, including any of the disclosed circuits, can be sent into (or otherwise applied to) the quantum processing unit(s) via control lines 1306 at the control of quantum processor controller 520. The quantum processor controller (QP controller) 1320 can operate in conjunction with a classical processor 1310 (e.g., having an architecture as described above with respect to FIG. 10 ) to implement the desired quantum computing process. In the illustrated example, the QP controller 1320 further implements the desired quantum computing process via one or more QP subcontrollers 1304 that are specially adapted to control a corresponding one of the quantum processor(s) 1302. For instance, in one example, the quantum controller 1320 facilitates implementation of the compiled quantum circuit by sending instructions to one or more memories (e.g., lower-temperature memories), which then pass the instructions to low-temperature control unit(s) (e.g., QP subcontroller(s) 1304) that transmit, for instance, pulse sequences representing the gates to the quantum processing unit(s) 1302 for implementation. In other examples, the QP controller(s) 1320 and QP subcontroller(s) 1304 operate to provide appropriate magnetic fields, encoded operations, or other such control signals to the quantum processor(s) to implement the operations of the compiled quantum computer circuit description. The quantum controller(s) can further interact with readout devices 1308 to help control and implement the desired quantum computing process (e.g., by reading or measuring out data results from the quantum processing units once available, etc.)

With reference to FIG. 13 , compilation is the process of translating a high-level description of a quantum algorithm into a quantum computer circuit description comprising a sequence of quantum operations or gates, which can include the circuits as disclosed herein (e.g., the circuits configured to perform one or more simulations as disclosed herein). The compilation can be performed by a compiler 1322 using a classical processor 1310 (e.g., as shown in FIG. 10 ) of the environment 1300 which loads the high-level description from memory or storage devices 1312 and stores the resulting quantum computer circuit description in the memory or storage devices 1312.

In other embodiments, compilation and/or verification can be performed remotely by a remote computer 1360 (e.g., a computer having a computing environment as described above with respect to FIG. 10 ) which stores the resulting quantum computer circuit description in one or more memory or storage devices 1362 and transmits the quantum computer circuit description to the computing environment 1300 for implementation in the quantum processing unit(s) 1302. Still further, the remote computer 1300 can store the high-level description in the memory or storage devices 1362 and transmit the high-level description to the computing environment 1300 for compilation and use with the quantum processor(s)). In any of these scenarios, results from the computation performed by the quantum processor(s) can be communicated to the remote computer after and/or during the computation process. Still further, the remote computer can communicate with the QP controller(s) 1320 such that the quantum computing process (including any compilation, verification, and QP control procedures) can be remotely controlled by the remote computer 1360. In general, the remote computer 1360 communicates with the QP controller(s) 1320, compiler/synthesizer 1322, and/or verification tool 1323 via communication connections 1350.

In particular embodiments, the environment 1300 can be a cloud computing environment, which provides the quantum processing resources of the environment 1300 to one or more remote computers (such as remote computer 1360) over a suitable network (which can include the internet).

IX. Summation

The dynamics of interest in many condensed matter systems occur in the low-energy subspace of the Hamiltonian. This work demonstrates that by simulating quantum dynamics in the interaction picture, the cost of simulation on a quantum computer for a large class of Hamiltonians can be made to scale with the low-energy component. This is especially useful when the dynamics at high energy scales are, in a certain sense, simple, but nevertheless are necessary as they couple to low energy scales. This represents a significant advance compared to state-of-art time-independent quantum simulation algorithms that generally scales with the spectral norm of the full Hamiltonian. Finding further improvements in simulation algorithms specialized for low-energy subspaces remains a major open problem. Embodiments of the disclosed approach are complementary to alternatives based on spectral gap amplification, and the possibility of combining these methods is an interesting future direction.

More generally, the complexity of time-independent quantum simulation for generic Hamiltonians, given minimal information, appears to have been resolved. Thus future advancements, such as this work, will likely focus on exploiting the detailed structure of Hamiltonians of interest. For example, the result for time-dependent sparse Hamiltonian simulation has scaling

(d∥H∥_(max)t), but improvement to

(√{square root over (d∥H∥_(max)∥H∥₁t)}) is straight-forward, if the induced one-norm ∥H∥₁ of the Hamiltonian is also known beforehand. The promise of results in similar directions is exemplified by recent work that exploit the geometric locality of interactions, or the sizes of different terms in a Hamiltonian. The challenge will be finding characterizations of Hamiltonians that are sufficiently specific so as to enable a speedup, yet sufficiently general so as to include problems of practical and scientific value.

X. Appendix

A. The Truncated Taylor Series Algorithm

The truncated Taylor series simulation algorithm was a major advance in quantum simulation for its conceptual simplicity and computational efficiency. The original algorithm is motivated by truncating the Taylor expansion of the time-evolution operator at degree K.

$\begin{matrix} {e^{- {iHt}} = {{1 - {iHt} + \frac{\left( {- {iHt}} \right)^{2}}{2!} + {\frac{\left( {- {iHt}} \right)^{3}}{3!}\ldots}} = {\underset{{\overset{\_}{R}}_{K}}{\underset{︸}{\sum\limits_{k = 0}^{K}\frac{\left( {- {iHt}} \right)^{k}}{k!}}} + {\underset{R_{K}}{\underset{︸}{\sum\limits_{k = {K + 1}}^{\infty}\frac{\left( {- {iHt}} \right)^{k}}{k!}}}.}}}} & (72) \end{matrix}$ Assuming that t>0 and that the truncation order K≥2∥H∥t, the norms of R _(K) and the remainder term R_(K) are bounded by

$\begin{matrix} {{{{\overset{\_}{R}}_{K}} = {{{e^{- {iHt}} - R_{K}}} \leq {1 + {R_{K}}}}},{{{R_{K}} \leq {\sum\limits_{k = {K + 1}}^{\infty}\frac{\left( {{H}t} \right)^{k}}{k!}} \leq {\frac{\left( {{H}t} \right)^{K + 1}}{\left( {K + 1} \right)!}{\sum\limits_{k = {K + 2}}^{\infty}\left( {1/2} \right)^{k - K - 1}}}} = {\frac{2\left( {{H}t} \right)^{K + 1}}{\left( {K + 1} \right)!}.}}} & (73) \end{matrix}$ Thus any unitary quantum circuit TTS that acts jointly on registers α,b,s and applies the non-unitary operator (

00|_(ab)⊗I_(s))TTS(|00

_(ab)⊗I_(s))≈R _(K) approximates the time-evolution operator with error δ and failure probability p given by

$\begin{matrix} {{\delta = {{{e^{- {iHt}} - {\overset{\_}{R}}_{K}}} = {{R_{K}} \leq \frac{2\left( {{H}t} \right)^{K + 1}}{\left( {K + 1} \right)!}}}},{{p \leq {1 - {\min\limits_{{❘\psi\rangle}_{s}}{❘\frac{{\overset{\_}{R}}_{K}\left. ❘\psi \right\rangle_{s}}{1 + {R_{K}}}❘}^{2}}}} = {{{1 - {\min\limits_{{❘\psi\rangle}_{s}}{❘\frac{\left( {e^{- {iHt}} - R_{K}} \right)\left. ❘\psi \right\rangle_{s}}{1 + {R_{K}}}❘}^{2}}} \leq {1 - {❘\frac{1 - {R_{K}}}{1 + {R_{K}}}❘}^{2}}} = {{4{R_{K}}} = {4{\delta.}}}}}} & (74) \end{matrix}$ Solving Eq. (74) for ∥H∥t=

(1) gives the required truncation order

$K = {{\mathcal{O}\left( \frac{\log\left( {1/\delta} \right)}{\log{\log\left( {1/\delta} \right)}} \right)}.}$

The simulation algorithm TTS in FIG. 4 is obtained by constructing two oracles. HAM_(K), which applies positive integer powers of (−iH)^(k) up to k=K, and COEF_(K), which prepares a quantum state that selects these terms with the right coefficients. HAM_(K) will require additional ancilla registers, which can be indexed with {right arrow over (α)}and {right arrow over (b)}. Note that the gate and space complexity in the truncated Taylor series algorithm is dominated by that of HAM_(K).

$\begin{matrix} {{\left. {\left. {{{\left( {\left\langle 0❘ \right._{\overset{\rightarrow}{a}} \otimes I_{s\overset{\rightarrow}{b}}} \right){{HAM}_{K}(}}❘}0} \right\rangle_{\overset{\rightarrow}{a}} \otimes I_{s\overset{\rightarrow}{b}}} \right):={\sum\limits_{k = 0}^{K}{\left. ❘k \right\rangle{\left\langle k❘ \right._{\overset{\rightarrow}{b}} \otimes \left( {- {iH}} \right)^{k}}}}},{{{COEF}_{K}\left. ❘0 \right\rangle_{\overset{\rightarrow}{b}}}:={\frac{1}{\sqrt{\beta}}{\sum\limits_{k = 0}^{K}{\sqrt{\frac{t^{k}}{k!}}\left. ❘k \right\rangle_{\overset{\rightarrow}{b}}}}}},{\beta = {{\sum\limits_{k = 0}^{K}\frac{t^{k}}{k!}} \leq {e^{t}.}}}} & (75) \end{matrix}$

FIG. 4 depicts the quantum circuit representation of (top, left) an example implementation of HAM_(K) from Eq. (75) using K queries to controlled-HAM, (top, right) a single step of the truncated Taylor series algorithm before oblivious amplitude amplification, and (bottom) a single step of time-evolution by the truncated Taylor series algorithm from Eq. (78). Note that β=2 as a single-round of oblivious amplitude amplification is used. Thin horizontal lines depict single-qubit registers. Filled circles depict a unitary controlled on the |0

state.

The original algorithm implements HAM_(K) using K queries to controlled-HAM C-HAM:=|1

1|_(b) ⊗I _(as)+|0

0|_(b) ⊗HAM  (76) with K copies of registers α and b. The state |k

_({right arrow over (b)})=|0

^(⊗k)|1

^(⊗K−k) that selects desired powers of H is encoded in unary, and so COEF_(K) may be implemented using

(K) primitive gates. Up to a proportionality factor β, the unitaries of Eq. (75) allow us to implement the desired linear combination R _(K) for simulating time-evolution.

$\begin{matrix} {{{TTS}_{\beta}:={\left( {{COEF}_{K}^{\dagger} \otimes I_{\overset{\rightarrow}{a}s}} \right){{HAM}_{K}\left( {{COEF}_{K} \otimes I_{\overset{\rightarrow}{a}s}} \right)}}}\left( {{\left\langle {0❘_{\overset{\rightarrow}{a}b}{\otimes I_{s}}} \right){{{TTS}_{\beta}\left( {❘0} \right\rangle}_{\overset{\rightarrow}{a}b} \otimes I_{s}}} = {\frac{{\overset{\_}{R}}_{K}}{\beta} \approx {\frac{e^{- {iHt}}}{\beta}.}}} \right.} & (77) \end{matrix}$ As R _(k) is close to unitary, the success probability ≈1/β² may be boosted using oblivious amplitude amplification. When β=2, a single round of oblivious amplitude amplification suffices to boost the success probability to 1−

(δ). Thus, one can choose ln 2≤t=

(1) such that β=2. If one desires |t|<ln 2, β may be decreased by appending a single-qubit ancilla and noting that |

0|e^(iθX)|0

|=|cos θ|≤1. Thus simulation is accomplished with the circuit TTS=TTS _(β=2)(REF _({right arrow over (α)}b) ⊗I _(s))TTS _(β=2) ^(†)(REF _({right arrow over (α)}b) ⊗I _(s)(TTS _(β=2) ,REF _({right arrow over (α)}b) =I _({right arrow over (α)}b)−2|0

0|_({right arrow over (α)}b).  (78) This approximates time-evolution by e^(−iHt) with error ∥(

0|_({right arrow over (α)}b)⊗I_(s))TTS(|0

_({right arrow over (α)}b)⊗I_(s))−e^(−iHt)∥=

(δ). In order to simulate evolution e^(−iHT) by longer times T>t, one can apply TTS^(T/t)− here t=⊖(1) is chosen such that T/t is an integer. The overall error ϵ=∥(

0|_({right arrow over (α)}b) ⊗I _(s))TTS ^(T/t)(|0

_({right arrow over (α)}b) ⊗I _(s))−e ^(−iHT)∥=

(Tδ),   (79) and success probability 1−

(ϵ) may thus be controlled by choosing the error of each segment to

${{be}\mspace{14mu}\delta} = {{\mathcal{O}\left( \frac{\epsilon}{T} \right)}.}$ This requires a truncation order of

$K = {{\mathcal{O}\left( \frac{\log\left( {\alpha\;{T/\epsilon}} \right)}{{loglog}\left( {\alpha\;{T/\epsilon}} \right)} \right)}.}$ One may drop the implicit assumption that ∥H∥≤1, by rescaling H→H/α, for some normalization constant α≥∥H∥. Thus simulation of e^(−iHt) requires

$\mathcal{O}\left( {\alpha\; T\frac{\log\left( {\alpha\;{T/\epsilon}} \right)}{{loglog}\left( {\alpha\;{T/\epsilon}} \right)}} \right)$ queries to C-HAM. Note that the gate cost of all queries to COEF_(K) at

$\mathcal{O}\left( {\alpha\; T\frac{\log\left( {\alpha\;{T/\epsilon}} \right)}{{loglog}\left( {\alpha\;{T/\epsilon}} \right)}} \right)$ and that of REF_({right arrow over (α)}b) at

${\mathcal{O}\left( {n_{a}\alpha\; T\frac{\log\left( {\alpha\;{T/\epsilon}} \right)}{{loglog}\left( {\alpha\;{T/\epsilon}} \right)}} \right)},$ is typically dominated by the gate cost of all applications of C-HAM.

The ancilla overhead of the truncated Taylor series algorithm, at

$n_{s} + {\mathcal{O}\left( {n_{a}\frac{\log\left( {1/\epsilon} \right)}{{loglog}\left( {1/\epsilon} \right)}} \right)}$ qubits, may be significantly improved by choosing the sequence of unitaries in the compression gadget Lemma 2 of Section III C to be U_(j)=−iHAM. This straightforwardly furnishes the following result.

Corollary 1 (Hamiltonian simulation by a compressed truncated Taylor series). Let a time-independent Hamiltonian H be encoded in standard-form with normalization α and n_(s)+n_(α) qubits, as per Eq. (5). Then the truncated Taylor series algorithm approximates the time-evolution operator e^(−iHt) for any |αt|≤ln2 to error ϵ using

${1.\mspace{14mu}{Queries}\mspace{14mu}{to}\mspace{14mu}{HAM}\text{:}\mspace{14mu}{{\mathcal{O}\left( \frac{\log\left( {1/\epsilon} \right)}{{loglog}\left( {1/\epsilon} \right)} \right)}.2.}\mspace{14mu}{Qubits}\text{:}\mspace{14mu} n_{s}} + {{{\mathcal{O}\left( {n_{a} + {{loglog}\left( {1/\epsilon} \right)}} \right)}.3.}\mspace{14mu}{Primitive}\mspace{14mu}{gates}\text{:}\mspace{14mu}{{\mathcal{O}\left( {\left( {n_{a} + {{loglog}\left( {1/\epsilon} \right)}} \right)\frac{\log\left( {1/\epsilon} \right)}{{loglog}\left( {1/\epsilon} \right)}} \right)}.}}$

For longer-time simulations e^(−iHT) of duration T>t, Corollary l is applied αT/ln (2) times, each with error

${\mathcal{O}\left( \frac{\epsilon}{\alpha T} \right)}.$ This leads a query complexity

${\mathcal{O}\left( {{\alpha T}\frac{\log\left( {{\alpha T}/\epsilon} \right)}{{loglog}\left( {{\alpha T}/\epsilon} \right)}} \right)}.$ Though the compressed algorithm is still worse than the quantum signal processing approach, which uses n_(s)+

(n_(α)) qubits, the technique is applicable to simulating time-dependent Hamiltonians, as demonstrated in Section III.

XI. Error from Truncating and Discretizing the Dyson Series

In this section, the proof of Lemma 1 is completed for the error from truncating the Dyson series at order K, and the error from approximating its terms, which are time-ordered integrals, with Riemann sums. These results provide a rigorous upper bound on the error of time-dependent Hamiltonian simulation. Let D_(k) be the k^(th) term in the Dyson expansion, and let B_(k) be the Riemann sum of D_(k) with each dimension discretized into M=t/Δ segments.

$\begin{matrix} {{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} = {{\sum\limits_{k = 0}^{\infty}\;{\left( {- i} \right)^{k}D_{k}}} = {\lim\limits_{M\rightarrow\infty}{\sum\limits_{k = 0}^{\infty}\;{\frac{\left( {- {it}} \right)^{k}}{M^{k}}B_{k}}}}}},{D_{k}:={\frac{1}{k!}{\int_{0}^{t}{\cdots{\int_{0}^{t}{{\mathcal{T}\left\lbrack {{H\left( t_{1} \right)}\mspace{14mu}\cdots\mspace{14mu}{H\left( t_{k} \right)}} \right\rbrack}d^{k}t}}}}}},{B_{k}:={\sum\limits_{0 \leq m_{k} < \cdots < m_{1} < M}{{H\left( {m_{k}\Delta} \right)}\mspace{14mu}\cdots\mspace{14mu}{{H\left( {m_{1}\Delta} \right)}.}}}}} & (80) \end{matrix}$

It is now proven that the bounds on the ϵ₁ term, which is the error due to truncating the Dyson series at order K.

Lemma 5. Let H(s):

→

^(N×N) be differentiable on the domain [0,t]. For any ϵ₁ ∈ [0, 2^(−e)], an approximation to the time ordered operator exponential of −iH(s) can be constructed such that

${{{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} - {\sum\limits_{k = 0}^{K}\;{\left( {- i} \right)^{k}D_{k}}}}} \leq \epsilon_{1}},{{1.\mspace{14mu} K} \geq \left\lceil {{- 1} + \frac{2{\ln\left( {1/\epsilon_{1}} \right)}}{{{lnln}\left( {1/\epsilon_{1}} \right)} + 1}} \right\rceil}$ 2.  max_(s)H(s)t ≤ ln  2 if one takes Proof of Lemma 5. We start by bounding |D_(k)∥.

$\begin{matrix} {{{{D_{k}} = {{\frac{1}{k!}{{\int_{0}^{t}{\cdots{\int_{0}^{t}{{\mathcal{T}\left\lbrack {{H\left( t_{1} \right)}\mspace{14mu}\cdots\mspace{14mu}{H\left( t_{k} \right)}} \right\rbrack}d^{k}t}}}} }} \leq}}\quad}{\quad{\quad{{\frac{1}{k!}{{\int_{0}^{t}{\cdots{\int_{0}^{t}{\prod\limits_{j = 1}^{k}\;{{{H\left( t_{j} \right)}}d^{k}t}}}}}}} \leq {\frac{\left( {{t\max}_{s}{{H(s)}}} \right)^{k}}{k!}.}}}}} & (81) \end{matrix}$ At this point, the proof is identical to the time-independent case as max_(s) ∥H(s)∥ is independent of time. Thus using Stirling's approximation and assuming K≥2 max_(s) ∥H(s)∥|t|,

$\begin{matrix} {\epsilon_{1} = {{{{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} - {\sum\limits_{k = 0}^{K}\;{\left( {- i} \right)^{k}D_{k}}}}} \leq {\sum\limits_{k = {K + 1}}^{\infty}\;{D_{k}}} \leq {\sum\limits_{k = {K + 1}}^{\infty}\frac{\left( {{t\max}_{s}{{H(s)}}} \right)^{k}}{k!}} \leq {\frac{\left( {{t\max}_{s}{{H(s)}}} \right)^{K + 1}}{\left( {K + 1} \right)!}{\sum\limits_{k = {K + 2}}^{\infty}\left( {1/2} \right)^{k - K - 1}}}} = {\frac{\left( {{t\max}_{s}{{H(s)}}} \right)^{K + 1}}{\left( {K + 1} \right)!} \leq \left( \frac{\left( {{{te}\max}_{s}{{H(s)}}} \right)}{K + 1} \right)^{K + 1}}}} & (82) \end{matrix}$ Now one can find that this in turn is less than ϵ₁ if max_(s) ∥H(s)∥te<min{ln(1/ϵ₁),eln2}≤1 given that, ϵ₁≤2^(−e) and

$\begin{matrix} {{K \geq {\max\left\{ {{{- 1} + \frac{\ln\left( {1/\epsilon_{1}} \right)}{W\left( \frac{\ln\left( {1/\epsilon_{1}} \right)}{\max_{s}{{{H(s)}}{te}}} \right)}},{2{\max\limits_{s}{{{H(s)}}{t}}}}} \right\}}},} & (83) \end{matrix}$ where W is the Lambert-W function. Using the fact that for x≥1, W(x)≥(ln(x)+1)/2 and ln(eln2)<1 one can obtain the simpler bound

$\begin{matrix} {K = {\left\lceil {{- 1} + \frac{2{\ln\left( {1/\epsilon_{1}} \right)}}{{{lnln}\left( {1/\epsilon_{1}} \right)} + 1}} \right\rceil = {{\mathcal{O}\left( \frac{\ln\left( {1/\epsilon_{1}} \right)}{{lnln}\left( {1/\epsilon_{1}} \right)} \right)}.}}} & (84) \end{matrix}$

It is now proven that the bounds on the ϵ₂ term for the error from approximating the Dyson series with its Riemann sum.

Lemma 6. Let H(s):

→

^(N×N) be differentiable on the domain [0,t]. Let one also define the quantities

$\left\langle {\overset{.}{H}} \right\rangle:={\frac{1}{t}{\int_{0}^{t}{\frac{{dH}(s)}{ds}}}}$ ds. For integer K≥0 and ϵ₂>0,

${{{{\sum\limits_{k = 0}^{K}{\left( {- i} \right)^{k}D_{k}}} - {\sum\limits_{k = 0}^{K}{\left( {{- i}\frac{t}{M}} \right)^{k}B_{k}}}}} \leq \epsilon_{2}},$ by choosing any M such that

${{1.\mspace{14mu} M} \geq {\frac{t^{2}}{\epsilon_{2}}4{e^{\max_{s}{{{H{(s)}}}t}}\left( {\left\langle {\overset{.}{H}} \right\rangle + {\max_{s}{{H(s)}}^{2}}} \right)}}},{{2.\mspace{14mu} M} \geq {K^{2}.}}$

Proof of Lemma 6. One first expands the time-ordered evolution operator using the Dyson series. One can then examine the error incurred in evaluating a given order of the Dyson series for a small hypercubic region of sidelength Δ=t/M. One can then upper bound the maximum number of such hypercubes within the allowed volume and use the triangle inequality to argue that the error is the product of the number of such hypercubes and the maximum error per hypercube.

Since H(s) is a differentiable function it holds from Taylor's theorem that for any δ«1 and computational basis states |x

,|y

,

$\begin{matrix} {{\left\langle x \right.{H\left( {s + \delta} \right)}\left. y \right\rangle} = {{\left\langle x \right.{H(s)}\left. y \right\rangle} + {\delta\left\langle x \right.{\overset{.}{H}(s)}\left. y \right\rangle} + {{o\left( {\max\limits_{s}{{{\overset{.}{H}(s)}}_{\max}\delta}} \right)}.}}} & (85) \end{matrix}$ Since computational basis states form a complete orthonormal basis it follows through norm inequalities that H(s+δ)=H(s)+δH(s)+o(∥H(s)∥_(max) N ²δ).  (86) One then has from Taylor's theorem and the triangle inequality that

$\begin{matrix} {{{{H\left( {s + \delta} \right)} - {H(s)}}} = {{{\sum\limits_{j = 1}^{r}\left\lbrack {{H\left( {s + {{j\delta}/r}} \right)} - {H\left( {s + {\left\lbrack {j - 1} \right\rbrack{\delta/r}}} \right)}} \right\rbrack}} \leq {{{\sum\limits_{j = 1}^{r}{{\overset{.}{H}\left( {s + {\left\lbrack {j - 1} \right\rbrack{\delta/r}}} \right)}{\delta/r}}}} + {r\left\lbrack {o\left( {\max\limits_{s}{{{\overset{.}{H}(s)}}_{\max}N^{2}{\delta/r}}} \right)} \right\rbrack}} \leq {{\int_{0}^{\delta}{{{\overset{.}{H}(s)}}{ds}}} + {{r\left\lbrack {o\left( {\max\limits_{s}{{{\overset{.}{H}(s)}}_{\max}N^{2}{\delta/r}}} \right)} \right\rbrack}.}}}} & (87) \end{matrix}$ Since this equation holds for all r, it also holds in the limit as r approaches infinity. Therefore ∥H(s+δ)−H(s)∥≤∫₀ ^(δ) ∥{dot over (H)}(s)∥ds.  (88)

Next, let one consider the error in approximating the integral over a hypercube to lowest order and let one define the hypercube to be C with x₁, . . . , x_(q) being the corner of the hypercube with smallest norm. First note that in general if A_(j) and B_(j) are a sequence of bounded operators and ∥·∥ is a sub-multiplicative norm then it is straight forward to show using an inductive argument that for all positive integer q.

$\begin{matrix} {{{{\prod\limits_{j = 1}^{q}\; A_{j}} - {\prod\limits_{j = 1}^{q}B_{j}}}} \leq {\sum\limits_{k = 1}^{q}{\left( {\prod\limits_{j = 1}^{k - 1}{A_{j}}} \right){{A_{k} - B_{k}}}{\left( {\prod\limits_{j = {k + 1}}^{q}{B_{j}}} \right).}}}} & (89) \end{matrix}$ By applying this in combination with Eq. (88) to region C, the error induced is

$\begin{matrix} {{{{{\int_{C}{{H\left( {x_{1} + y_{1}} \right)}\mspace{14mu}\ldots\mspace{14mu}{H\left( {x_{q} + y_{q}} \right)}d^{q}y}} - {\Delta^{q}{\prod\limits_{j = 1}^{q}{H\left( x_{j} \right)}}}}} \leq {\int_{C}{{{{\prod\limits_{j = 1}^{q}{H\left( {x_{j} + y_{j}} \right)}} - {\prod\limits_{j = 1}^{q}{H\left( x_{j} \right)}}}}{dy}^{q}}}},{\leq {\sum\limits_{k = 1}^{q}{\left( {\prod\limits_{j = 1}^{k - 1}{\int_{0}^{\Delta}{{{H\left( {x_{j} + s} \right)}}{ds}}}} \right)\;\left( {\int_{0}^{\Delta}{\int_{0}^{s}{{{\overset{.}{H}\left( {x_{k} + y} \right)}}{dyds}}}} \right)\;\left( {\prod\limits_{j = {k + 1}}^{q}{\int_{0}^{\Delta}{{{H\left( x_{j} \right)}}{ds}}}} \right)}} \leq {\sum\limits_{k = 1}^{q}{\left( {\prod\limits_{j = 1}^{k - 1}{\int_{0}^{\Delta}{{{H\left( {x_{j} + s} \right)}}{ds}}}} \right)\left( {\int_{0}^{\Delta}{\int_{0}^{s}{{{\overset{.}{H}\left( {x_{k} + y} \right)}}{dyds}}}} \right)\;\left( {\prod\limits_{j = {k + 1}}^{q}{\int_{0}^{\Delta}{\alpha\;{ds}}}} \right)}} \leq {(\alpha)^{q - 1}\Delta{\sum\limits_{k = 1}^{q}{\left( {\prod\limits_{j \neq k}^{q}{\int_{0}^{\Delta}{ds}}} \right)\left( {\int_{0}^{\Delta}{{{\overset{.}{H}\left( {x_{k} + s} \right)}}{ds}}} \right)}}}}} & (90) \\ {\mspace{79mu}{{{where}\mspace{14mu}\alpha}:={\max_{s}{{{H(s)}}.}}}} & \; \end{matrix}$

There are two regions in the problem. The first region, which one calls the bulk, is the region that satisfies all the constraints of the problem namely bulk :={(t₁, . . . , t_(q)):└t₁/Δ┘> . . . >└t_(q)/Δ┘}. Thus for any index x₁, . . . , x_(q) to a hypercube in the bulk, the ordering of terms H(x₁+t₁) . . . H(x_(q)+t_(q)) in the integrand of Eq. (90) is fixed. The second region is called the boundary which is the region in which the hypercubes used in the Riemann sum would stretch outside the allowed region for the integral. Since one can approximate the integral to be zero on all hypercubes that intersect the boundary, the maximum error in the approximation is the maximum error that the discrete approximation to the integrand can take within the region scaled to the volume of the corresponding region.

Finally one has from Eq. (90) that the contribution to the error from integration over the bulk of the simplex is

$\begin{matrix} {{\sum\limits_{\underset{x_{1} < x_{2} < \;\ldots\; < x_{q}}{\overset{\rightarrow}{x} \in {\{{0,\Delta,\;\ldots\;,\;{{({M - 1})}\Delta}}\}}^{q}}}{{{\int_{C}{{H\left( {x_{1} + t_{1}} \right)}\mspace{14mu}\ldots\mspace{14mu}{H\left( {x_{q} + t_{q}} \right)}d^{q}t}} - {\Delta^{q}{\prod\limits_{j = 1}^{q}{H\left( x_{j} \right)}}}}}} \leq {\sum\limits_{\underset{x_{1} < x_{2} < \;\ldots\; < x_{q}}{\overset{\rightarrow}{x} \in {\{{0,\Delta,\;\ldots\;,{{({M\;\ldots\; 1})}\Delta}}\}}^{q}}}{(\alpha)^{q - 1}\Delta{\sum\limits_{k = 1}^{q}{\left( {\prod\limits_{j \neq k}^{q}{\int_{0}^{\Delta}{ds}}} \right)\left( {\int_{0}^{\Delta}{{{\overset{.}{H}\left( {x_{k} + s} \right)}}{ds}}} \right)}}}}} & (91) \end{matrix}$

In order to understand how the error scales let one examine the partial sum over x₁ for fixed k>1 is

$\begin{matrix} {{\sum\limits_{x_{2},\ldots\;,x_{q}}{\sum\limits_{x_{1} = 0}^{x_{2} - 1}{\int_{0}^{\Delta}{{{\overset{.}{H}\left( {x_{1} + s} \right)}}{{ds}\left\lbrack {(\alpha)^{q - 1}{\Delta\left( {\prod\limits_{j \neq k}^{q}{\int_{0}^{\Delta}{ds}}} \right)}\left( {\int_{0}^{\Delta}{{{\overset{.}{H}\left( {x_{k} + s} \right)}}{ds}}} \right)} \right\rbrack}}}}} \leq {\sum\limits_{x_{2},\ldots\;,\; x_{q}}{\int_{0}^{x_{2}\Delta}{{{\overset{.}{H}(s)}}{{{ds}\left\lbrack {(\alpha)^{q - 1}{\Delta\left( {\prod\limits_{j \neq k}^{q}{\int_{0}^{\Delta}{{{H\left( {x_{j} + s} \right)}}{ds}}}} \right)}\left( {\int_{0}^{\Delta}{{{\overset{.}{H}\left( {x_{k} + s} \right)}}{ds}}} \right)} \right\rbrack}.}}}}} & (92) \end{matrix}$ The integral then takes exactly the same form as the original integral and so by repeating the argument q−1 times it is easy to see, even in the case where k=1, that

$\begin{matrix} {{{\sum\limits_{\underset{x_{1} < x_{2} < \;\ldots\; < x_{q}}{\overset{\rightarrow}{x} \in {\{{0,\Delta,\;\ldots\;,{{({M - 1})}\Delta}}\}}^{q}}}{(\alpha)^{q = 1}\Delta{\sum\limits_{k = 1}^{q}{\left( {\prod\limits_{j \neq k}^{q}{\int_{0}^{\Delta}{ds}}} \right)\left( {\int_{0}^{\Delta}{{{\overset{.}{H}\left( {x_{k} + s} \right)}}{ds}}} \right)}}}} \leq {(\alpha)^{q - 1}\Delta{\sum\limits_{k = 1}^{q}{\left( {\prod\limits_{j \neq k}^{q}{\int_{0}^{x_{j + 1}\Delta}{ds}}} \right)\left( {\int_{0}^{x_{k + 1}\Delta}{{{\overset{.}{H}(s)}}{ds}}} \right)}}} \leq {\frac{\left( {\alpha\; t} \right)^{q - 1}}{\left\lbrack {q - 1} \right\rbrack!}\Delta{\int_{0}^{t}{{{\overset{.}{H}(s)}}{ds}}}}},} & (93) \end{matrix}$ where the definition that x_(q+1):=M=t/Δ has been used and the fact that ƒ₀ ^(x)∥H(s)∥ds is a monotonically increasing function of x. Thus the contribution to the error from the boundary is at most

$\begin{matrix} {{{\sum\limits_{q = 1}^{K}{\frac{\left( {\alpha\; t} \right)^{q - 1}}{\left\lbrack {q - 1} \right\rbrack!}\Delta{\int_{0}^{t}{{\overset{.}{H}}(s){ds}}}}} \leq {\sum\limits_{q = 1}^{\infty}{\frac{(\alpha)^{q - 1}}{\left\lbrack {q - 1} \right\rbrack!}\Delta{\int_{0}^{t}{{{\overset{.}{H}(s)}}{ds}}}}}} = {e^{\alpha\; t}\Delta{\int_{0}^{T}{{{\overset{.}{H}(s)}}{{ds}.}}}}} & (94) \end{matrix}$

Next one desirably estimate the volume of the boundary. If a point is on the boundary then by definition there exists at least one t_(j) such that └t_(j)/Δ┘=└t_(j+1)/Δ┘ taking t₀=t. All other values are consistent with points within the bulk. It is then straight forward to see that (after relabeling the indexes for the summation) that the volume can be expressed as the sum over all possiole choices of such sums with at least one matched index. If one further assumes that q²Δmax_(s)∥H(s)∥/[αt]≤ln(2) then one has the following upper bound on boundary contribution to the error in the Dyson series:

$\begin{matrix} {{{\int{\prod\limits_{j = 1}^{q}{{{H\left( t_{q} \right)}}\delta_{t \in {bdy}}{dt}^{q}}}} \leq {\sum\limits_{p = 1}^{q}{\Delta^{q}{\max\limits_{s}{{{H(s)}}^{p}{\alpha^{q - p}\begin{pmatrix} q \\ p \end{pmatrix}}{\sum\limits_{j_{1} = 1}^{t/\Delta}{\sum\limits_{j_{2} = 1}^{j_{1} - 1}\mspace{11mu}{\ldots\mspace{11mu}{\sum\limits_{j_{q - p} = 1}^{j_{q - p - 1} - 1}1}}}}}}}}} = {{{\Delta^{q}\alpha^{q}{\sum\limits_{p = 1}^{q}{\max\limits_{s}{{{H(s)}}^{p}{\alpha^{- p}\begin{pmatrix} q \\ p \end{pmatrix}}{\begin{pmatrix} {t/\Delta} \\ {q - p} \end{pmatrix}\; \cdot}}}}} \leq {\Delta^{q}\alpha^{q}{\sum\limits_{p = 1}^{q}{\max\limits_{s}{{{H(s)}}^{p}\alpha^{- p}\frac{\left( {t/\Delta} \right)^{q}}{{q!}{p!}}\left( \frac{q^{2}\Delta}{t} \right)^{p}}}}}} = {{\frac{t^{q}\alpha^{q}}{q!}{\sum\limits_{p = 1}^{q}{\frac{1}{p!}\left( \frac{q^{2}\Delta\;{\max_{s}{{H(s)}}}}{\alpha\; t} \right)^{p}}}} \leq {\frac{t^{q}\alpha^{q}}{q!}\left( \frac{q^{2}\Delta\;{\max_{s}{{H(s)}}}}{\alpha\; t} \right){\sum\limits_{p = 0}^{\infty}{\frac{1}{p!}\left( \frac{q^{2}\Delta\;{\max_{s}{{H(s)}}}}{\alpha^{t}} \right)^{p}}}} \leq {\frac{2{q\left( {t\;\alpha} \right)}^{q - 1}}{\left( {q - 1} \right)!}\left( {\Delta\;{\max\limits_{s}{{H(s)}}}} \right)}}}} & (95) \end{matrix}$ Note that when q=1, there is no boundary contribution. Using the fact that q/(q−1)≤2, ∀q≥2, this upper bound

$\begin{matrix} {{\int{\prod\limits_{j = 1}^{q}{{{H\left( t_{q} \right)}}\delta_{t \in {bdy}}{dt}^{q}}}} \leq \frac{4\;\Delta\;{\max_{s}{{{H(s)}}\left( {\alpha\; t} \right)^{q - 1}}}}{\left( {q - 2} \right)!}} & (96) \end{matrix}$ It then follows from summing Eq. (96) that the error is

$\begin{matrix} {\delta_{bdy} \leq {\sum\limits_{q = 1}^{K}{\int{\prod\limits_{j = 1}^{q}{{{H\left( t_{q} \right)}}\delta_{t \in {bdy}}{dt}^{q}}}}} \leq {\sum\limits_{q = 2}^{\infty}\frac{4\;\Delta\;{\max_{s}{{{H(s)}}\left( {\alpha\; t} \right)^{q - 1}}}}{\left( {q - 2} \right)!}} \leq {4\;\Delta\;\alpha\; t\;{\max\limits_{s}{{{H(s)}}{e^{\alpha\; t}.}}}}} & (97) \end{matrix}$ By adding this result to that of Eq. (94)

$\begin{matrix} {{\sum\limits_{q = 1}^{K}{{D_{q} - {\Delta^{q}B_{q}}}}} \leq {\delta_{bulk} + \delta_{bdy}} \leq {4\;\Delta\;{{te}^{\max_{s}{{{H{(s)}}}t}}\left( {\left\langle {\overset{.}{H}} \right\rangle + {\max\limits_{s}{{H(s)}}^{2}}} \right)}}} & (98) \end{matrix}$ It follows from elementary algebra that this total error is at most ϵ₂ if

$\begin{matrix} {\Delta \leq {\frac{\epsilon_{2}}{4\;{{te}^{\max_{s}{{{H{(s)}}}t}}\left\lbrack {\left\langle {\overset{.}{H}} \right\rangle + {\max_{s}{{H(s)}}^{2}}} \right\rbrack}}.}} & (99) \end{matrix}$ Expressed in terms of the number of points

${M = \frac{t}{\Delta}},$ the total error is at most ϵ₂ if choose any M such that

$\begin{matrix} {M \geq {\frac{t^{2}}{\epsilon_{2}}4{e^{\max_{s}{{{H{(s)}}}t}}\left\lbrack {\left\langle {\overset{.}{H}} \right\rangle + {\max\limits_{s}{{H(s)}}^{2}}} \right\rbrack}}} & (100) \end{matrix}$ The final bound on M quoted immediately follows from

$\left. {{\frac{q^{2}\Delta}{\alpha\; t}{\max_{s}{{H(s)}}}} \leq {\ln(2)}}\Rightarrow{K \leq \sqrt{{\ln(2)}M} \leq {\sqrt{M}.}} \right.$

Now that it has been proven that the necessary results regarding the error in the truncated Dyson series simulation, the following proof of Lemma 1 can be provided:

Proof of Lemma 1. This is proven by combining two intermediate results using a triangle inequality. The approximation error is upper-bounded by

$\begin{matrix} {\mspace{79mu}{{{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} - {\sum\limits_{k = 0}^{K}{\left( {{- i}\frac{t}{M}} \right)^{k}B_{k}}}}} =}} & (101) \\ {{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} - {\sum\limits_{k = 0}^{K}{\left( {- i} \right)^{k}D_{k}}} + {\sum\limits_{k = 0}^{K}{\left( {- i} \right)^{k}D_{k}}} - {\sum\limits_{k = 0}^{K}{\left( {{- i}\frac{t}{M}} \right)^{k}B_{k}}}}} & (102) \\ {\leq {\underset{\underset{\epsilon_{1}}{︸}}{{{\mathcal{T}\left\lbrack e^{{- i}{\int_{0}^{t}{{H{(s)}}{ds}}}} \right\rbrack} - {\sum\limits_{k = 0}^{K}{\left( {- i} \right)^{k}D_{k}}}}} + \underset{\underset{\epsilon_{2}}{︸}}{{{\sum\limits_{k = 0}^{K}{\left( {- i} \right)^{k}D_{k}}} - {\sum\limits_{k = 0}^{K}{\left( {{- i}\frac{t}{M}} \right)^{k}B_{k}}}}}} \leq {\epsilon.}} & (103) \end{matrix}$ One can take in both cases the errors to obey ϵ₁=ϵ/2 and ϵ₂=ϵ/2. The result then follows by taking the most restrictive of the assumptions of Lemma 5 and Lemma 6. □

XII. Concluding Remarks

Having described and illustrated the principles of the disclosed technology with reference to the illustrated embodiments, it will be recognized that the illustrated embodiments can be modified in arrangement and detail without departing from such principles. For instance, elements of the illustrated embodiments shown in software may be implemented in hardware and vice-versa. Also, the technologies from any example can be combined with the technologies described in any one or more of the other examples. It will be appreciated that procedures and functions such as those described with reference to the illustrated examples can be implemented in a single hardware or software module, or separate modules can be provided. The particular arrangements above are provided for convenient illustration, and other arrangements can be used. 

What is claimed is:
 1. A method, comprising: configuring a quantum computer to simulate time-evolution of a quantum system represented by a time-independent Hamiltonian of the form H=A+B, wherein A and B are two non-commuting parts, a free particle theory part A and a part describing interactions B by: transforming the time-independent Hamiltonian to a time-dependent Hamiltonian, wherein the time-dependent Hamiltonian has the form H(t)=e^(iAt)Be^(−iAt); approximating a truncation and discretization of a Dyson series for the time-dependent Hamiltonian H(t) with a spectral norm α=max_(t)∥H(t)∥ and average rate-of change ${\left\langle {\overset{.}{H}} \right\rangle = {\frac{1}{t}{\int_{0}^{t}{{{\overset{.}{H}(s)}}{ds}}}}};$ and performing a simulation of the quantum system using the quantum computer by implementing a circuit of quantum gates.
 2. The method of claim 1, wherein the simulation is performed using linear combinations of unitaries.
 3. The method of claim 2, wherein the simulation uses linear combinations of unitaries performed on a diagonally dominant matrix.
 4. The method of claim 3, wherein the simulation using linear combinations of unitaries is performed on diagonally dominant components of the diagonally dominant matrix.
 5. The method of claim 1, wherein the quantum system is modelled by a Hubbard model.
 6. The method of claim 1, wherein the quantum system describes a physical chemical system or molecule.
 7. The method of claim 1, wherein the Hamiltonian is sparse and the simulation uses a state of an auxiliary qubit to encode matrix elements of the Hamiltonian instead of using graph decomposition techniques.
 8. The method of claim 1, further comprising reducing the number of ancillas for quantum simulation of a time-dependent Hamiltonian by avoiding duplication of registers needed to perform the quantum simulation.
 9. A computing system, comprising: a classical computer configured to: transform a time-independent Hamiltonian of the form H=A+B wherein A and B are two non-commuting parts, a free-particle theory part, A, and a part describing interactions, B, to a time-dependent Hamiltonian in the interaction picture wherein the time-dependent Hamiltonian has the form H(t)=e^(iAt)Be^(−iAt); approximate a truncation and discretization of a Dyson series for the time-dependent Hamiltonian, H(t), with a spectral-norm α=max_(t) ∥H(t)∥ and an average rate-of-change ${\left\langle {\overset{.}{H}} \right\rangle = {\frac{1}{t}{\int_{0}^{t}{{{\overset{.}{H}(s)}}{ds}}}}};$ and compile a circuit of quantum gates representing the approximated truncated and discretized Dyson series; and a quantum computer configured to perform a simulation of the quantum system by implementing the circuit of quantum gates.
 10. A method, comprising: implementing a quantum algorithm on a quantum computer for simulating a general sparse time-dependent quantum system represented by a time-dependent Hamiltonian H(t), wherein the quantum algorithm does not use graph decomposition techniques by: approximating a truncation and discretization of a Dyson series for the time-dependent Hamiltonian, H(t), with a spectral-norm α=max_(t)∥h(t)∥ and average rate-of-change ${\left\langle {\overset{.}{H}} \right\rangle = {\frac{1}{t}{\int_{0}^{t}{{{\overset{.}{H}(s)}}{ds}}}}};$ compiling a circuit of quantum gates representing the approximated truncated and discretized Dyson series; and implementing the quantum algorithm on the quantum computer by implementing the circuit of quantum gates.
 11. The method of claim 10, wherein the time-dependent Hamiltonian used in the simulation is represented in the interaction picture.
 12. The method of claim 10, wherein the simulation uses linear combinations of unitaries performed on a diagonally dominant matrix.
 13. The method of claim 10, wherein the quantum system is modelled by a Hubbard model.
 14. The method of claim 10, wherein the quantum system describes a physical chemical system, or a molecule.
 15. The method of claim 10, further comprising reducing the number of ancillas used to index a time for the simulation by avoiding duplication of registers needed to index a time for the simulation. 